WO2024163394A1 - Enabling service data protection through discovery, r-graph propagation andapi model - Google Patents
Enabling service data protection through discovery, r-graph propagation andapi model Download PDFInfo
- Publication number
- WO2024163394A1 WO2024163394A1 PCT/US2024/013443 US2024013443W WO2024163394A1 WO 2024163394 A1 WO2024163394 A1 WO 2024163394A1 US 2024013443 W US2024013443 W US 2024013443W WO 2024163394 A1 WO2024163394 A1 WO 2024163394A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- service
- resource
- saas
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
- H04L41/5058—Service discovery by the service manager
Definitions
- This patent application relates to methods and systems for determining data resiliency, including data protection and compliance status, for complex data processing environments. It futher relates to a structured way to describe any Service (e.g., SaaS, PaaS, DBaaS, laaS) related data and configuration and the associated data resilience management methods to enable the ability to protect and recover data with minimal complexity.
- Service e.g., SaaS, PaaS, DBaaS, laaS
- Data protection is the process of safeguarding important information from corruption, compromise, or loss.
- Enterprises such as businesses, organizations, universities and other large groups of users also have little tolerance for downtime that can make it impossible to keep the enterprise running.
- Cloud Services e.g., SaaS (software as a service), PaaS (Platform as a Service), DBaaS (Database as a Service, laaS (Infrastructure as a Service)
- SaaS software as a service
- PaaS Platinum as a Service
- DBaaS Database as a Service
- laaS Infrastructure as a Service
- the service provider may itself provide for data backup and maintenance, including data protection, which frees the business’ own staff from complex software and hardware management.
- Data Protection as a Service allows organizations to reduce risk and shift from owning and maintaining backup infrastructure to simply accessing and utilizing it in a pay-as- you-go model. They choose how much compute, networking, and storage they might need based on previous workloads, with the ability to scale when demand changes. They also specify encryption, retention, and security policies as part of their lease and leave backup storage planning and deployment to the data protection vendor.
- Every organization has 10s to 100s to 1000s of applications, services, systems, data sources and other data processing resources that support their operations.
- the ability to understand if the data processing resources used by an organization are protected or complaint is a continuing challenge.
- the status of such entities is not a simple “good/bad” parameter (especially from a data protection perspective). For example, data critical to an organization may be protected, and even most workloads might be protected. However, when other workloads have failed, reliance on the status of data backup alone may lead to an incorrect conclusion that the business as a whole is protected.
- a final data protection policy may consist of multiple layers. These data protection layers may include SaaS internal backup, separate Data Protection as a Service (DPaaS) products (such as those provided by HYCU, Inc.), other methods for data replication, archiving, and the like. More generally, business operations are considered to be protected when a Service Level Objective (SLO) is achieved, regardless of the implementation specifics. Thus the actual status of an enterprises’ data protection is based on what the expectations are.
- SLO Service Level Objective
- the solution should take into consideration that: a) even if applications/data sources are not protected they may or may not be critical; b) just because an important resource is not protected or not in compliance, does not mean the organization needs to panic — because other alternate sources might already be providing a solution; and c) modern applications and services are usually a complex distributed architecture of various services and data across different technology stacks (and/or computing environments or public/private clouds) — thus, individual items’ protection/compliance does not mean the protection of the required appli cati on/ servi ce .
- the approach taken here determines the data protection and compliance status for an organization’s data processing resources on a per-organizati on-unit basis.
- the protection and compliance status is determined across the entire data processing environment, including hosted applications (which may include combinations of physical or virtual machines, databases, and storage devices), cloud services (which may include workloads, data storage, and services on SaaS or other services such as those provided by Google).
- hosted applications which may include combinations of physical or virtual machines, databases, and storage devices
- cloud services which may include workloads, data storage, and services on SaaS or other services such as those provided by Google.
- the approach provides insight into the status of the overall organization (e,g., top level corporate) or at other (e.g., a department or operational) organization levels.
- an approach to data protection leverages an Identity Provider (IdP) service to discover SaaS/DBaasS/PaaS or other services, and then automatically applies an appropriate data protection scheme for such services.
- IdP Identity Provider
- the approach brings a new level of SaaS awareness to the modern and complex multi-cloud environments by enabling automatic detection of SaaS services that are hosted outside of company infrastructure and to then interact with them.
- Such interactions may include, for example, confirming how data protection attributes are configured or other interactions.
- the automatic SaaS discovery process leverages a user authentication service such as an Identity Provider (IdP) service.
- IdP Identity Provider
- Automatic SaaS discovery can be triggered either as a scheduled job or as a response to an external event (for example, when a new virtual SaaS service is integrated within IdP Single Sign One (SSO) service).
- SSO Single Sign One
- the only action required by the user is to provide IdP credentials for the discovery process to access the remote IdP service(s).
- end users are able to connect to their identify management provider(s), which will then automatically gather the SaaS/DPaaS/PaaS services used by their organization.
- a processor may generate a model of data processing resources within a data processing environment.
- a “data processing resource” may include any feature, service, product, or attribute of the service to which a policy may be assigned.
- a data resilience-graph (R-graph) may be generated based the model, where the R- graph includes an object for each resource group, where for example, each resource group may include one or more resources. Each resource group may be represented by a leaf in the R- graph. The object for each resource group may further include at least a compliance or protection attribute and a criticality attribute.
- a processor may display a domain-specific view by applying compliance and criticality rules to the objects in the R-graph.
- the techniques described herein relate to a method for assessing data resilience status of a data processing environment including: generating a model of data processing resources within the environment; generating a data resilience-graph (R-graph) based the model, the R-graph including an object for each resource, the object for each resource further including at least a compliance attribute and a criticality attribute; applying compliance and criticality rules to the objects in the R-graph; and displaying a domain-specific view of the R- graph.
- R-graph data resilience-graph
- the techniques described herein relate to an apparatus for assessing compliance status of a data processing environment including: one or more data processors; and one or more computer readable media including instructions that, when executed by the one or more data processors, cause the one or more data processors to perform a process for: generating a model of data processing resources within the environment; generating a data resilience-graph (R-graph) based the model, the R-graph including an object for each resource, the object for each resource further including at least a compliance attribute and a criticality attribute; applying compliance and criticality rules to the R-graph; and displaying a domain-specific from the resulting R-graph.
- R-graph data resilience-graph
- an integrator/develop/administrator access a platform that provides a facility to define the way that data are protected by a service.
- the platform can be used, for example, to determine whether the SaaS itself provides a recovery method, and at which level of granularity those recovery method(s) are provisioned. This information can then be stored in a metadata catalog, to record what "level" of recovery the SaaS itself can provide as well as the specific configuration deployed of each end user.
- a platform we call the R-Cloud Platform provides an easy way to specify the data resilience (including backup and/or recovery) workflow.
- the platform provides the facility for an integrator/developer to define the way the configuration and data are held by the service. This definition is flexible to accommodate a wide range of SaaS applications.
- the platform allows the service developer to provide a simple abstraction of how-to backup and recover the different parts of the service. This allows the platform to support different levels or granularity of the recovery for every service.
- the platform also provides a way to maintain the varying types of metadata associated with the granularity of the data being protected and leverages it for granular recovery.
- the service definition and resilience methods are orchestrated and leveraged by the R-Cloud Platform to deliver data protection for every aaS that integrates with it.
- the platform delivers significant amount of capability, including but not limited to: [0034] backup data management / retention / copies
- This innovation enables the as-a-Service integrator/developer to start with just providing two sets of structured information to the platform to create data resilience for the Service.
- the two are:
- the Service Data Definition is intended for the integrator/developer of the as-a- Service to define the following (including, but not limited to): different levels of hierarchy of the resources within the as-a-Service, such as different data objects or groups of data objects, description of the data objects, the type of data in each of the objects, if the objects have associations with other objects defined in the structure, ability to protect the object, and the sequence in which the data needs to be backed up and recovered.
- the Service Data Management can have multiple parts, but the minimum required are the Backup and Restore methods.
- the as-a-Service Integrator/Developer can define the required processing method to protect that part of the object from the service. This is for both Backing up and Restoring.
- the platform leverages the Service Data Definition to discover the internals of the data stored in the service, provide backup methods for different data (resource) types as defined by the integrator and provide a User Interface driven restore (restore scenario definition) of the data specific to that Service by invoking the right associated methods.
- Service Data Definition to discover the internals of the data stored in the service, provide backup methods for different data (resource) types as defined by the integrator and provide a User Interface driven restore (restore scenario definition) of the data specific to that Service by invoking the right associated methods.
- Fig 1 is an overview of an example SaaS awareness implementation.
- Fig 2 is an example of discovered SaaS services/applications.
- FIG 3 is an overview of an example core data structure for SaaS-service/appli cation discovery.
- Fig. 4 is a flow diagram for an automated discovery and data protection process.
- FIG. 5 shows a model of a data processing environment that is augmented with a Data
- R-Graph Resilience Graph
- Fig. 6 is a typical data processing environment in more detail.
- Fig. 7 is an example architecture of a complex application.
- Fig. 8 is one example of R-Graph attributes.
- Fig. 9 illustrates example attributes of R-Graph entities.
- Fig. 10 is a table that describes the complete logic for propagation of the protected status in the environment.
- Fig. 11 is a resulting display of compliance illustrating criticality.
- Fig. 12 is an example high level flow for applying the propagation logic.
- Fig. 13 shows the access points and data organization for an example Google Cloud
- Fig. 14 shows the access points and data organization for a more complex SaaS application, such as DropBox.
- Fig. 15 shows an R-Cloud Platform that manages a catalog that represents the data organization for a SaaS.
- Fig. 16 is an example of how the R-Cloud Platform interacts with a data catalog.
- Fig. 17 is a flow diagram of a method that may be used to implement data protection as described herein.
- Fig. 1 illustrates an example data processing environment 100 where a process for automated discovery of SaaS services and /or applications may be implemented.
- the environment 100 may be a typical enterprise such as a business, university, organization, or other group of individual users that access a set of SaaS services and/ or applications 130-1, 130-2, ..., 130-n.
- the SaaS 130 may include sSalesForce, CloudSQL, DropBox and other SaaS services/applications. It should be understood however that other enterprises are different and that different or additional SaaS services/applications 130 may be deployed.
- the enterprise utilizes an identity provider (IdP) service 120, such as one using Single Sign On (SSO) (like Okta or Azure AD (Entra)), to control access to the SaaS services and/or applications 130.
- IdP identity provider
- SSO and similar IdP services 120 permit each user to use one set of login credentials — for example, a username and password to access multiple SaaS services/applications 130 and simplify the management of multiple login credentials.
- the SaaS awareness function (referred herein to as the “R- Cloud Platform 110) accesses the IdP 120 to in turn access the SaaS 130-1, 130-2, ..., 130n in order to discover which SaaS services are in use by the enterprise.
- the R-Cloud Platform 110 may include the following operations:
- SaaS service discovery Remotely detecting if a SaaS service 130 is provisioned and running.
- an R-Cloud Module 140-1, 140-2, 140-n be specifically designed for each SaaS service 130-1, 130-2, ...,130n.
- R-Cloud Platform exposes a set of interfaces and types preferably enforces a common hierarchy and uniformity of SaaS-specific implementations within different R-Cloud Modules 140. For example, there may be an R-Cloud Module 140-1 for Salesforce, a different R-Cloud Module 140-2 for Dropbox, etc.).
- the internal implementation for each application probe can be done in any way that best suits the needs of the particular SaaS service.
- the R-Cloud (SaaS) Module may be deployed as an application running within the enterprise or as an external service.
- the modules can thus be deployed in two modes:
- the first or discovery stage 112 of SaaS awareness therefore consists of remotely detecting if a SaaS service 130 is integrated within the customer environment. This can be done by leveraging and IdP 120 Single Sign On (SSO) service used by the enterprise.
- SSOs may include OKTA and Azure. On OKTA and Azure AD, this information can be retrieved through respective REST APIs.
- Fig. 2 is an example map of resulting discovered SaaS services. If the SaaS service is not integrated within the customer’s SSO service, it can instead be added manually via a user interface to the list of discovered SaaS services.
- the list of discovered services/applications may carry attributes such as a “name” associated with each service, , and a graphically indicated status of various features such as compliance of the service, such as protection and discovery.
- information reflected the discovered services is arranged in a tree referred to herein as an R-graph 200.
- the enterprise called HYCU that uses a mix of Software as a Service (Saas) resources and hosted resources.
- the Engineering department 202 uses Jira 203, Confluence 204, and GitHub 205 services they access as SaaS; the Finance Department 210 uses Navision 211 and Tipalti 212; Legal 220 uses Docusign 221 and a shared data repository 222; the Sales Department 230 uses SalesForce 232, and a couple of hosted resources (a Demo Data Center 234 and Demo Cloud 236), and Operations 240 does not yet have any managed resources.
- the second stage of SaaS awareness involves gathering SaaS-application-specific information. These attributes may be discovered during a LIST operation (implemented by each R- Cloud Module 140) on the respective SaaS -application 140.
- the R-Cloud Platform 110 includes a service data management function that discovers service attributes, stores them, and then uses that information to drive backup and restore workflows and optional attributes.
- Fig. 3 is an overview of an example core data structure that may be used by the R- Cloud Platform 110 to implement SaaS service/application discovery.
- discovery of SaaS 140 is performed to determine if it has a corresponding backup method, restore method, configuration method, status method, and other information, such as lists of required attributes and optional attributes. The specifics of each method and list of attributes differs depending on the type of SaaS 140.
- the R-Cloud Platform includes an R-Cloud Manager 310 component, a Service Data Definition 320, and the R-Cloud Modules 140.
- Each R-Cloud Module 140 is programmed to access its associated SaaS application 130 such as through an Application Programming Interface (API) 325.
- API Application Programming Interface
- the Service Data Definition consists of resource objects 340 which correspond to the attributes of a corresponding discovered SaaS application 130. These attributes may be discovered such as during a LIST operation on the SaaS-application. Each such LIST operation may return a
- the structure may identify a list of required attributes that the R-Cloud platform 110 will then use to drive backup and restore methods, as well as an optional list of attributes that are meaningful only to the module.
- these discovered attributes 350 may include values for an identifier 351, name 352, and type 353 of the SaaS 130. Also included are attributes such as whether or not it has other related dependent or subservient services 354, provides its own backup method 355, defines a backup sequence 356, or defines a restore sequence 357. Still other attribbutes may include whether it can display metadata 358, its location 359, and other metrics 360.
- the R-Cloud Platform has now discovered the extent of the default data protection built into a Service. This allows the end users to determine what is already present in the system and what additional data protection they may desire.
- An example of a required attribute is the »canBackup « attribute which indicates to the R-Cloud platform 110 that a SaaS implements a backup method.
- Example optional attributes may further define the »canBackup « attribute to specify one or more levels of a hierarchical resource the backup protection is or is not implemented.
- the »hasSubResources « can be set to True.
- the child resources may be further defined as optional attributes, such as a list of cloud SQL servers, a list of of SQL instances are running on each server, a list of databases running on each SQL instance, and a list of tables in each database.
- the optional attributes may further specify a
- »canBackup « attribute for each object in the list, such that it can be determined whether each server, instance, database, and table can or cannot be backed up at its corresponding level.
- optional child attributes of a DropBox SaaS may include a file structure hierarchy including top level personal / public / shared folders, a subfolder under each such top level folder, and then files within each subfolder.
- the optional attributes may thus specify whether this particular DropBox resource can be backed up, or not, at each level of the top level/subfolder/file hierarchy.
- the child attributes therefore enable adaption of the discovery service to be customized to different use cases.
- the configuration method may include configuration options and UI attributes (such as access credentials) for implementing the actual backup and resource methods.
- Fig. 4 is a high level flow chart for an example automated service discovery and data protection process that may be implemented within the system above.
- a first step 402 access is provided to an authentication service.
- a next step 404 that authentication service is queried to automatically discover the services that have been provisioned.
- step 406 further details about the data configuration for each service is discovered. As explained above, this may include a service version, identification of logical entities within the service, or respective data and metadata being hosted within the service.
- step 408 Further options regarding the service can also be retrieved in step 408. As explained above, this may include things such as storage consumption, the number of subscribed users, and other attributes that may assist with data protection.
- an R- cloud module for a SQL database service will perform different functions from an R- cloud module for a Dropbox service.
- an appropriate R- cloud plugin for each service is invoked to discover service-specific attributes.
- this second stage of discovery determines, for example in step 412, the data protection attributes of each service, such as whether the service has a »can_backup « attribute.
- Attributes of the users configuration of each service are then discovered in step 414.
- the service is a SQL database service then information regarding parent-child databases can be retrieved, whether each database can be backed up or restored, or to what extent backup and restore operations can be handled by the service.
- FIG. 5 shows how a model 5-100 of a data processing environment 5-102 can be augmented with a Data Resilience Graph (“R-Graph”) 5-104 and a domain-specific viewer 5- 106.
- R-Graph Data Resilience Graph
- a typical enterprise’s data processing environment 5-102 may consist of a wide variety of resources such as hosted applications, services, public/private clouds, servers, storage, databases, processors, and many other types of data processing elements.
- a model 5-100 of these resources may be developed and maintained in different ways such as via Simple Network Management Protocol (SNMP), Common Information Model (CIM), or other methods that define how the managed resources in an IT environment 5-102 are represented as a common set of objects and relationships between them.
- SNMP Simple Network Management Protocol
- CIM Common Information Model
- the Model 5-100 may include the ability to collect the compliance and protection status of the managed resources. As is known in the art, this status information may be automatically discovered via agents, plug-ins, via Application Programming Interfaces (APIs) and the like installed in the managed resources. However this information may also be collected in other ways, such as manually. The collected status information is then stored in a Data Resilience Graph (“R-Graph”) data structure 5-5-104, an example of which is discussed in more detail below.
- R-Graph Data Resilience Graph
- the R-Graph 5-104 may be augmented with criticality information 5-108 that is further processed by rules we call propagation logic 5-110.
- the criticality 5-108 for a particular resource may differ depending on the perspective of different domains, such as departments or functions, within the enterprise. Thus a given resource may have different criticality values for different domains.
- a viewer application 5-106 then provides a display of one or more aspects of the R- Graph to an IT manager or other user after application of the propagation logic 5-110.
- the display generated by the viewer 5-106 is tailored to the specific domain of interest. For example, the user may only be interested in the compliance status from the perspective of a particular department in an enterprise.
- a typical modem company has dozens of complex business applications and services which are at the heart of the business availability status. Some of them are critical to the operation of the business, some of them have a standard importance, and some of them have almost no impact on the core business. Thus, a resource considered critical by one department may not be critical for another. Or perhaps upper level management prefers to know the status of the organization as a whole, and considers all of the systems that support a given department (such as sales or manufacturing) more critical than another department’s systems (such as engineering).
- Fig. 6 shows an example R-graph 6-200 for an example enterprise 201 called HYCU that uses a mix of Software as a Service (Saas) resources and hosted resources.
- Saas Software as a Service
- the Engineering department 6-202 uses Jira 6-203, Confluence 6- 204, and GitHub 6-205 services they access as SaaS;
- the Finance Department 6-210 uses Navision 6-211 and Tipalti 6-212;
- Legal 6-220 uses Docusign 6-221 and a shared data repository
- the Sales Department 6-230 uses SalesForce 6-232, and a couple of hosted resources (a Demo Data Center 6-234 and Demo Cloud 6-236), and Operations 6-240 does not yet have any managed resources.
- the user’s mouse is hovering over the Tipalti 6-212 resource and can see it was last backed up on 30 August.
- the checkmarks next to the different resources indicate their compliance status; an “x” indicates a resource that is not in compliance (e.g., Navision).
- a “shield” next to a resource may indicate the data protection status for the node, and a “dot” next to a resource may indicate its compliance status.
- Fig 7 is an example architecture of a complex hosted application 7-300. It consists of a set of resources including pair of application servers 7-302-1, 7-302-2, a pair of load balancers
- the Model 5-100 collects this data protection, compliance, and criticality information for each resource.
- the SLO may specify that it is sufficient if only one of the replicas is protected.
- the resulting R-Graph propagation logic 5-110 may determine that the application is protected if Replica 1 OR Replica 2 is protected.
- the SLO may specify the need for both Replica 1 AND Replica 2 to be protected for the application as a whole to be considered to be protected.
- the SLO may consider the Replicas to be considered “critical”, but still specify that Replica 1 “OR” Replica 2 is sufficient since they are replicas and not the Master Database.
- Every organization’s protectable data processing resources can thus be described by tracking the entities shown in the example table 8-400 of Fig. 8 in an R-Graph 5- 104.
- the type of source object can specified as various types (e g., virtual machine (VM) fileserver, container, SaaS service, application, etc.).
- Groups of resources can also be defined and the groups can be nested.
- a “VMs” group consists of the two virtual machines (vml, vm2) and an “application 1” group consists of the VMs group, a SalesForce SaaS and a CloudSQL service.
- Fig. 9 illustrates a table 9-500 that is but one example of the possible attributes of R- Graph entities in more detail.
- a source object may have further attributes other than just a type, such as
- the R-Graph may also implement certain rules for propagation of their individual protected and compliance status of "child” members of a larger hierarchy or group.
- rules for “Propagation of Protected Status” may include:
- Fig. 10 is an example of a table 10-600 that describes the propagation logic 5-104 for propagation of protected status, depending on the criticality of the resource.
- example rules for propagation of compliant status may include:
- Criticality may include Critical (Protected) and Standard (Not Protected) and a Propagation rule may include Protected with Warnings.
- the logic may include other conditions.
- the user may be able to set criteria (such as equal to or greater than 50% of child resources) to trigger non-compliance or a warning to a higher level.
- Fig. 11 is a resulting R-Graph 11-700 that shows how Criticality and Compliance propagate for a particular organization’s resources.
- the rectangles represent managed resources, each of which may represent individual or groupings of applications or services, or a department’ s managed resources, or groups of applications.
- criticality is indicated by the color of each rectangle, with blue indicating standard criticality and orange indicating critical. Otherwise the color of each rectangle is indicated in the words next to it.
- the view may be of a particular department’s resources or the entire enterprise.
- a “critical/compliant” resource 11-710 (shown with an orange rectangle labeled “C”) and a “standard/non-compliant” resource 11-711 (shown with a blue rectangle labeled “N”) propagates to a “critical/non-compliant” resource 711-12 (an orange rectangle labled “N”).
- a pair of “critical/compliant” resources 11- 720, 11-721 propagate to “critical/compliant” resource 11-722.
- Level 3 11-703 is a department that has a standard/compliant resource 11-730 and a standard/non-compliant resource 11-731 but which propagates to a “critical/compliant” resource 11-732 because of the nature of the department that Level 3 supports (perhaps it is the Sales department).
- the propagation logic 5-104 similarly processes the other Levels 11-704, 11 -1 1 -705, 11-706, 11-707 to generate the resulting overall view 11-700.
- the overall view 11-700 of the propagated statuses of the entire organization exposes an overall non-compliant/critical status 11-760 , which may prompt some immediate action.
- “Protection” and “compliance” propagation may not be absolute (e.g., they need not always resolve to a “yes” and “no” answer.
- different weightings or scales may be configured at different levels.
- a process for generating an R-Graph representation of the Criticality and Compliance status of a particular enterprise’s data resources can now be appreciated.
- One such example process is shown in Fig. 12.
- a model of the data processing resources in an enterprise is generated.
- step 12-804 an R-Graph is generated from the model as explained above.
- step 12-806 this involves, for each object in the R-Graph, determining a criticality attribute (step 12-808) and a compliance attribute (step 12-810).
- step 12-812 other attributes for the object(s) may also be determined.
- step 12-814 these attributes are recorded for the object, and processing returns to step 12-806 until all objects are processed.
- a typical SaaS service operates in a computing environment that deploys one or more access points to users via an Application Programming Interface (API).
- API Application Programming Interface
- the access points consist of various resources behind which data is organized in a hierarchical fashion.
- Fig. 13 shows the access points 13-100 and data organization 13-110 for an example Google Cloud SQL SaaS application 13-120.
- This example has access points that include a relational database instance 13-102 that includes three relational databases (Database #1 13-104- 1, Database #2 13-104-2, and Database #3 13-104-3); each database 13-104 may be a separate access point.
- the databases 13-104 are each further organized into sets of tables 13-106, with each database having a certain number of tables.
- database 13-104-1 has tables 13- 106-1-1 and 13-106-1-2.
- the databases 13-104 do not all have the same number of tables.
- Each table 13-106 may include structured rows of billions of data objects. Looking at this from a data protection perspective, granular recovery is not required by this particular user (that is, recovery of individual specific tables), because of complex relationships between the data in the rows/columns. Recovery of the entire database is sufficient.
- Fig. 14 shows the access points 14-200 and data organization 14-210 for a more complex SaaS application 14-220, such as DropBox.
- This DropBox application 14-220 hosts data for several different Projects 14-201-1, 14-201-2, 14-201-3, 14-201-4, 14-201 -5; the data for each project 14-201 is further organized in folders 14-203 and fdes 14-205 (or other objects such as databases 14-202) in a hierarchical directory.
- the files 14-205 are typically organized such that there is a folder 14-203 for each project 14-201, and each project 14-201 in turn has data stored in many subfolders.
- Each subfolder may contain files, databases, or other objects related to some specific aspect of its associated project (here project 14-201-1 is indicated as the individual databases of .PNG files).
- the data service points thus also correspond to different levels of this hierarchy, such the different Projects (folders) and subfolders.
- Granular recovery say of each project or even of each individual file, is important in this particular access point instance for this particular user.
- the data organization 14-210 indicates requirements for data protection include: [00183] backup of the whole service or only a selected project
- a metadata catalog is maintained that reflects the one or more levels at which recovery is needed.
- a platform called the R-Cloud Platform 15-310 manages a catalog 15-320 that represents the data organization 13-110, 14-210 for each SaaS 15-340-1, 15- 340-2, 15-340-3 in use by the enterprise.
- the R-Cloud platform may be accessed by a user 15-350 via an API 15-360; in addition, each SaaS provides the aforementioned access points via their own respective APIs 15-335-1, 15-335-2, 15-335-3.
- the catalog 15-320 stores metadata regarding the organization and attributes of the data objects within one or more SaaS services operated by an enterprise.
- the catalog 320 is maintained by the R-Cloud platform 15- 310 in cooperation with a number of plug-in modules 15-330. There is, for example an R-Cloud plug-in module 330 for each SaaS 15-340.
- an R-Cloud module 15-330 is provided for each SaaS application 15-15- 340, the end user 15-350 need only interact with the R-Cloud platform 15-310 such as via the API 360.
- the developer of the R-Cloud plug-in module 330 for a given SaaS can determine whether the SaaS itself provides a recovery method, and at which level of granularity those recovery method(s) are provided. This information can then be recorded in the metadata catalog 15-320 via the R-Cloud plug-in modules 330.
- each item in the catalog be it a set of data objects (such as database) or even a single data object (such as an individual file) has an associated data protection attribute.
- the data protection attribute may be a »canBackup « attribute that indicates whether the object can be independently backed up such as by accessing the APIs of the respective SaaS service 340.
- the R-Cloud Modules 330 thus access the catalog 320 to determine how to implement data protection for each object.
- a data object that can be backed up by the service itself (such as a database in a CloudSQL application) may have its »canBackup « attribute set to True.
- each object such as a database may its »canBackup « property set to True.
- the R-Cloud Platform 15-310 can invoke the R-Cloud plug-in 15-330-1 and then know to call concurrently the SaaS specific backup method via the Cloud SQL 15-340-1 API 15-335-1. This frees the end user from having to know the backup or other data protection capabilities of each specific SaaS.
- the metadata catalog may have only a single attribute.
- the metadata catalog may have many attributes that are exposed via API 15-360 to the end user / developer 15-350.
- the R-Cloud platform 15-310 may enable end users / developers 15-350 to browse the catalog 15-320 via the R-Cloud Modules 15-320 by invoking a List operation to discover the backupable items.
- the end user/developer 15-350 may then further augment the metadata in the catalog 15-320 with information that describes a backup workflow that is outside the SaaS APIs themselves.
- Information (metadata or attributes) about the items/data contained within each Project 14-201 at the backup time will be stored in the catalog 15-320 by the R-Cloud Module 15-330 together with backup data during the backup operation. All of the items below the Project level 14-201 (such as folders 14-203 and files 14-205) have their »canBackup « attribute set to False, meaning that they cannot be separately selected/unselected for backup.
- the user 15-350 can select a specific project 201 (and its backup version) and invoke a browse option (which operates a list method of the R-Cloud Module 15-330-2 for the SaaS 15-340-2). This enables the user to discover which items were also actually backed up when respective Project was backed up by the SaaS.
- an available data restore workflow operation (outside of the SaaS 340-2 itself) may individually specified by the user via the API 15-360 and recorded in the catalog 15-320. This workflow can then be invoked by the R-Cloud platform 15-310 via the R-Cloud Plug-in module 15-330-2 when recovery is requested. Other aspects of recovery, such as data dependencies (recovery order) may also be recorded as metadata in the catalog 15-320.
- the system can now invoke an automatic recovery workflow and be assured that the appropriate method for each SaaS will be invoked, regardless of the structure of data objects.
- the back up may be entirely done by the respective SaaS (such in the CloudSQL 340-1 example above), or completely managed by the R-Cloud plugin module 15-330-2 discovering the appropriate methods (such as in the DropBox 15-340-2 example), or some mix of the two.
- the R-Cloud plug ins 15-330 use the catalog 15-320 to understand the specifics of each SaaS service's 340 backup abilities, freeing the end user 15-350 from having to know these details.
- Fig. 16 is a more detailed depiction of the R-Cloud Platform 15-310 and how the API 15-360 can be leveraged by the user 15-350.
- discovery of each SaaS is performed to determine if it has a corresponding data protection method and its attributes, such as a backup method, restore method, configuration method, or status method.
- Other information such as lists of required attributes and optional attributes is also collected. The specifics of each method and list of attributes differs depending on the type of SaaS application 340.
- the R-Cloud Platform includes an R-Cloud Manager 16-410 component, a Service Data Definition 16-420, and Service Data Management 16-430, and the R- Cloud Modules 15-330.
- Each R-Cloud Module 15-330 is programmed to access its associated SaaS application 15-340 such as through an Application Programming Interface (API) 15-335.
- API Application Programming Interface
- the Service Data Definition 16-420 consists methods that include an authentication method 16-422 and a discovery method 16-424. These methods are used to discover attributes of a SaaS 340, resource, such as during a LIST operation. Each such LIST operation may return a [00206] list that describes certain aspects of the structure of the SaaS application. The structure may identify a list of required attributes that the R-Cloud platform 15-310 will then use to drive backup and restore methods, as well as an optional list of other attributes.
- Service Data Management 16-430 may include methods for defining backup options 16-432, backup execution 16-434, defining recovery services 16-436 and recovery execution 16- 438.
- the attributes 15-370 discovered for each SaaS may include values for an identifier, name, and SaaS type. Also discovered may be attributes such as whether or not the SaaS has other related dependent or subservient data objects, provides its own backup method, defines a backup sequence, or defines a restore sequence. Still other attributes may include whether the SaaS can display metadata, its location, and other metrics.
- An example of a discovered attribute for a SaaS is its »canBackup « attribute. This indicates to the R-Cloud platform 14-210 that the SaaS implements a native backup method.
- Example optional attributes may further define the »canBackup « attribute to specify, at one or more levels of a data hierarchicy, whether backup protection is available.
- the »hasSubResources « can be set to True.
- the child resources may be further defined as optional attributes, such as a list of cloud SQL servers, a list of of SQL instances are running on each server, a list of databases running on each SQL instance, and a list of tables in each database.
- the optional attributes may further specify a »canBackup « attribute for each object in the list, such that it can be determined whether each server, instance, database, and table can or cannot be backed up at its corresponding level.
- the catalog indicates that the data objects 380 include a file structure that has a root (top level) folder that has SubResources.
- a resource A itself is a folder that SubResource C. Resource C does not have any child resource.
- the hasSubResource for object B also indicates that it does not have any child resources.
- a developer can use other attributes, such as a backupSeqGroup and/or restoreSeqGroup attributes to control the order of operations.
- the instance resource type may have restoreSeqGroup set to 1 and all database resources will have restoreSeqGroup set to 2. This means that the R- Cloud platform will execute:
- Fig. 17 is a high level flow diagram of a method that may be used to implement aspects of this data protection scheme using the system and features as described above.
- a set of services are identified.
- step 17-504 access points for each given service are determined.
- step 17-506 for each access point, data objects are determined.
- the data objects in the case of a SQL service may include one or more databases.
- the granularity of the data objects may be projects, folders and files.
- Other services may have other types of data objects.
- step 17-508 a data protection attribute for each data object is determined. As explained above, this may include a »can Backup « attribute for that data object.
- This information is then stored in a catalog in the next step 17-510.
- Step 17-520 represents some later time at which a user may browse the catalog and in step 522 select an access point or an object and review or change its available data protection status.
- step 17-530 which is some later time still, a recovery workflow is invoked.
- the catalog may thus be accessed at step 17-532 to discover data protection schemes in use and then instantiated at step 17-534. Note that this catalog is configured and maintained outside of SaaS itself, even if a given service itself provides protection.
- the workflow of the example embodiments described above may be implemented in many different ways.
- the various “data processors” may each be implemented by a physical or virtual or cloud-based general purpose computer having a central processor, memory, disk or other mass storage, communication interface(s), input/output (I/O) device(s), and other peripherals.
- the general-purpose computer is transformed into the processors and executes the processes described above, for example, by loading software instructions into the processor, and then causing execution of the instructions to carry out the functions described.
- such a computer may contain a system bus, where a bus is a set of hardware lines used for data transfer among the components of a computer or processing system.
- the bus or busses are essentially shared conduit(s) that connect different elements of the computer system (e.g., one or more central processing units, disks, various memories, input/output ports, network ports, etc.) that enables the transfer of information between the elements.
- One or more central processor units are attached to the system bus and provide for the execution of computer instructions.
- I/O device interfaces for connecting the disks, memories, and various input and output devices.
- Network interface(s) allow connections to various other devices attached to a network.
- One or more memories provide volatile and/or non-volatile storage for computer software instructions and data used to implement an embodiment. Disks or other mass storage provides non-volatile storage for computer software instructions and data used to implement, for example, the various procedures described herein.
- Embodiments may therefore typically be implemented in hardware, custom designed semiconductor logic, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), firmware, software, or any combination thereof.
- ASICs Application Specific Integrated Circuits
- FPGAs Field Programmable Gate Arrays
- the procedures, devices, and processes described herein are a computer program product, including a computer readable medium (e.g., a removable storage medium such as one or more DVD-ROM's, CD-ROM's, diskettes, tapes, etc.) that provides at least a portion of the software instructions for the system.
- a computer readable medium e.g., a removable storage medium such as one or more DVD-ROM's, CD-ROM's, diskettes, tapes, etc.
- Such a computer program product can be installed by any suitable software installation procedure, as is well known in the art.
- at least a portion of the software instructions may also be downloaded over a cable, communication and/or wireless connection.
- Embodiments may also be implemented as instructions stored on a non-transient machine-readable medium, which may be read and executed by one or more procedures.
- a nontransient machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device).
- a nontransient machine-readable medium may include read only memory (ROM); random access memory (RAM); storage including magnetic disk storage media; optical storage media; flash memory devices; and others.
- firmware, software, routines, or instructions may be described herein as performing certain actions and/or functions. However, it should be appreciated that such descriptions contained herein are merely for convenience and that such actions in fact result from computing devices, processors, controllers, or other devices executing the firmware, software, routines, instructions, etc.
- block and system diagrams may include more or fewer elements, be arranged differently, or be represented differently. But it further should be understood that certain implementations may dictate the block and network diagrams and the number of block and network diagrams illustrating the execution of the embodiments be implemented in a particular way.
- Embodiments may also leverage cloud data processing services such as Amazon Web Services, Google Cloud Platform, and similar tools.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method for assessing data resilience status of a data processing environment that includes generating a model of data processing resources within the environment, generating a data resilience-graph (R-graph) based the model, the R-graph including an object for each resource. The object for each resource further includes at least a compliance attributed and a criticality attribute. The method may be augmented by a discovery process that leverages an Identity Provider (IdP) service. An API may also be used to interact with the R-Graph, for example, to display or manipulated a domain-specific view by applying compliance and criticality rules to the objects in the R-graph.
Description
ENABLING SERVICE DATA PROTECTION THROUGH DISCOVERY, R- GRAPH PROPAGATION AND API MODEL
CROSS REFERENCE TO RELATED APPLICATIONS
[001] This patent application claims priority to co-pending U.S. Provisional Patent Appl. No.: 63/442,138 entitled “DISCOVERY OF SERVICES IN COMBINATION WITH ENABLING DATA PROTECTION AND OTHER WORKFLOWS, Filed January 31, 2023; U.S. Provisional Patent Appl. No.: 63/442,139 entitled R-GRAPH PROPAGATION OF DATA PROTECTION AND COMPLIANCE STATUSES, Filed: January 31, 2023; and U.S.
Provisional Patent Appl. No.: 63/442,140 entitled API MODEL FOR AS-A-SERVICE DATA RESILIENCE MANAGEMENT Filed: January 31, 2023, each of which are incorporated by reference herein.
TECHNICAL FIELD
[002] This patent application relates to methods and systems for determining data resiliency, including data protection and compliance status, for complex data processing environments. It futher relates to a structured way to describe any Service (e.g., SaaS, PaaS, DBaaS, laaS) related data and configuration and the associated data resilience management methods to enable the ability to protect and recover data with minimal complexity.
BACKGROUND
[003] Data protection is the process of safeguarding important information from corruption, compromise, or loss. Enterprises such as businesses, organizations, universities and other large
groups of users also have little tolerance for downtime that can make it impossible to keep the enterprise running.
[004] Consequently, a large part of a data protection strategy is ensuring that data can be restored quickly after any corruption or loss. Protecting data from compromise and ensuring data privacy are other key components of data protection.
[005] As data moves to cloud and edge applications, enterprises must adapt. With the increasing sophistication of security attacks, and as the data environment changes, relying on legacy data protection technologies makes this complex and expensive to manage and operate. [006] Cloud Services (e.g., SaaS (software as a service), PaaS (Platform as a Service), DBaaS (Database as a Service, laaS (Infrastructure as a Service)) have become an integral part of many business computing environments. The advantages of these cloud services are well known and include the ability to scale to meet demand as needed, and to only pay for what it needed. It is also expensive and time-consuming to maintain any software application on a regular basis. However, with these “as-a-Service" deployments, the service provider may itself provide for data backup and maintenance, including data protection, which frees the business’ own staff from complex software and hardware management.
[007] Data Protection as a Service allows organizations to reduce risk and shift from owning and maintaining backup infrastructure to simply accessing and utilizing it in a pay-as- you-go model. They choose how much compute, networking, and storage they might need based on previous workloads, with the ability to scale when demand changes. They also specify encryption, retention, and security policies as part of their lease and leave backup storage planning and deployment to the data protection vendor.
[008] Every organization has 10s to 100s to 1000s of applications, services, systems, data sources and other data processing resources that support their operations. The ability to understand if the data processing resources used by an organization are protected or complaint is a continuing challenge. The status of such entities is not a simple “good/bad” parameter (especially from a data protection perspective). For example, data critical to an organization may be protected, and even most workloads might be protected. However, when other workloads have failed, reliance on the status of data backup alone may lead to an incorrect
conclusion that the business as a whole is protected.
[009] Some SaaS applications provide internal backup. In addition, a final data protection policy may consist of multiple layers. These data protection layers may include SaaS internal backup, separate Data Protection as a Service (DPaaS) products (such as those provided by HYCU, Inc.), other methods for data replication, archiving, and the like. More generally, business operations are considered to be protected when a Service Level Objective (SLO) is achieved, regardless of the implementation specifics. Thus the actual status of an enterprises’ data protection is based on what the expectations are.
SUMMARY OF PREFERRED EMBODIMENT(S)
[0010] What is needed is a way for an organization to quickly assess their data resilience, such as data protection and compliance status, both as a whole and for individual departments/functions. However, a mere count of applications, services, and data sources that are protected or not protected is insufficient. A better solution would enable Information Technology (IT) managers/CIO/CISOs to get a quick view of whether they should take immediate action or not. The solution should take into consideration that: a) even if applications/data sources are not protected they may or may not be critical; b) just because an important resource is not protected or not in compliance, does not mean the organization needs to panic — because other alternate sources might already be providing a solution; and c) modern applications and services are usually a complex distributed architecture of various services and data across different technology stacks (and/or computing environments or public/private clouds) — thus, individual items’ protection/compliance does not mean the protection of the required appli cati on/ servi ce .
[0011] The systems and methods described in this patent application enable organizations to automate the creation of the holistic data protection and compliance status across and within complex applications, services, and organizations. The ability to understand this overall status is crucial for IT personnel to be able to monitor, manage and report against the top-level business services.
[0012] The ability to calculate and propagate this status from within an organization unit or across the overall organization is a complex task.
[0013] The approach taken here determines the data protection and compliance status for an organization’s data processing resources on a per-organizati on-unit basis. The protection and
compliance status is determined across the entire data processing environment, including hosted applications (which may include combinations of physical or virtual machines, databases, and storage devices), cloud services (which may include workloads, data storage, and services on SaaS or other services such as those provided by Google). The approach provides insight into the status of the overall organization (e,g., top level corporate) or at other (e.g., a department or operational) organization levels.
[0014]
[0015] Leveraging Identity Provider Service for Discovery and Data Protection Implementation
[0016]
[0017] In one aspect, an approach to data protection leverages an Identity Provider (IdP) service to discover SaaS/DBaasS/PaaS or other services, and then automatically applies an appropriate data protection scheme for such services.
[0018] More particularly, the approach brings a new level of SaaS awareness to the modern and complex multi-cloud environments by enabling automatic detection of SaaS services that are hosted outside of company infrastructure and to then interact with them. Such interactions may include, for example, confirming how data protection attributes are configured or other interactions.
[0019] The automatic SaaS discovery process leverages a user authentication service such as an Identity Provider (IdP) service. Automatic SaaS discovery can be triggered either as a scheduled job or as a response to an external event (for example, when a new virtual SaaS service is integrated within IdP Single Sign One (SSO) service). The only action required by the user is to provide IdP credentials for the discovery process to access the remote IdP service(s). [0020] As a result, end users are able to connect to their identify management provider(s), which will then automatically gather the SaaS/DPaaS/PaaS services used by their organization. An Application Data catalog is then leveraged to automatically categorize the SaaS/DPaaS/PaaS, determine a method required to understand the current protection status, and to then deploy the appropriate data protection primitives for that SaaS/DPaaS/PaaS - all of this without the user having to engage in manual operations.
[0021] Data Resilience Graph (R-Graph) Representation of Protection and Compliance
[0022]
[0023] In some embodiments, a processor may generate a model of data processing resources within a data processing environment. Generally speaking, a “data processing resource” may include any feature, service, product, or attribute of the service to which a policy may be assigned. A data resilience-graph (R-graph) may be generated based the model, where the R- graph includes an object for each resource group, where for example, each resource group may include one or more resources. Each resource group may be represented by a leaf in the R- graph. The object for each resource group may further include at least a compliance or protection attribute and a criticality attribute. A processor may display a domain-specific view by applying compliance and criticality rules to the objects in the R-graph.
[0024] In some aspects, the techniques described herein relate to a method for assessing data resilience status of a data processing environment including: generating a model of data processing resources within the environment; generating a data resilience-graph (R-graph) based the model, the R-graph including an object for each resource, the object for each resource further including at least a compliance attribute and a criticality attribute; applying compliance and criticality rules to the objects in the R-graph; and displaying a domain-specific view of the R- graph.
[0025] In other aspects, the techniques described herein relate to an apparatus for assessing compliance status of a data processing environment including: one or more data processors; and one or more computer readable media including instructions that, when executed by the one or more data processors, cause the one or more data processors to perform a process for: generating a model of data processing resources within the environment; generating a data resilience-graph (R-graph) based the model, the R-graph including an object for each resource, the object for each resource further including at least a compliance attribute and a criticality attribute; applying compliance and criticality rules to the R-graph; and displaying a domain-specific from the resulting R-graph.
[0026]
[0027] API Model for Managing Data Resilience
[0028]
[0029] Furthermore, in some implementations there should be a single, simple, and consistent mechanism for developers/integrators to define the structure of data in a Cloud Service, their attributes and mechanisms to backup and recover them. It should be in the context of the service and should assist the information be presented in the context of the service itself. This attribute and mechanism definitions should be taken by the data protection platform and allow it to discover, protect and recover the service in the context of the service. Most important of all is the ability to provide service specific recover workflows. To make the entire process scalable, the integrator should be not expected to develop any custom UI code to create the recovery or the backup workflows.
[0030] In summary, an integrator/develop/administrator access a platform that provides a facility to define the way that data are protected by a service. The platform can be used, for example, to determine whether the SaaS itself provides a recovery method, and at which level of granularity those recovery method(s) are provisioned. This information can then be stored in a metadata catalog, to record what "level" of recovery the SaaS itself can provide as well as the specific configuration deployed of each end user.
[0031] More particularly, a platform we call the R-Cloud Platform provides an easy way to specify the data resilience (including backup and/or recovery) workflow. The platform provides the facility for an integrator/developer to define the way the configuration and data are held by the service. This definition is flexible to accommodate a wide range of SaaS applications. In addition to the definition, the platform allows the service developer to provide a simple abstraction of how-to backup and recover the different parts of the service. This allows the platform to support different levels or granularity of the recovery for every service.
[0032] To enable the protection and granular recovery of the service-related data, the platform also provides a way to maintain the varying types of metadata associated with the granularity of the data being protected and leverages it for granular recovery.
[0033] The service definition and resilience methods are orchestrated and leveraged by the R-Cloud Platform to deliver data protection for every aaS that integrates with it. The platform delivers significant amount of capability, including but not limited to:
[0034] backup data management / retention / copies
[0035] policy management / scheduling
[0036] consolidation / reporting
[0037] consumption management / reporting
[0038] billing through (incl. various marketplaces from hyper-scalers like Google, Amazon, Azure)
[0039] This innovation enables the as-a-Service integrator/developer to start with just providing two sets of structured information to the platform to create data resilience for the Service. The two are:
[0040] Service Data Definition
[0041] Service Data Management
[0042] The Service Data Definition is intended for the integrator/developer of the as-a- Service to define the following (including, but not limited to): different levels of hierarchy of the resources within the as-a-Service, such as different data objects or groups of data objects, description of the data objects, the type of data in each of the objects, if the objects have associations with other objects defined in the structure, ability to protect the object, and the sequence in which the data needs to be backed up and recovered.
[0043] The Service Data Management can have multiple parts, but the minimum required are the Backup and Restore methods.
[0044] For each of the resource types defined in the Service Data Definition, the as-a-Service Integrator/Developer can define the required processing method to protect that part of the object from the service. This is for both Backing up and Restoring.
[0045] The platform leverages the Service Data Definition to discover the internals of the data stored in the service, provide backup methods for different data (resource) types as defined by the integrator and provide a User Interface driven restore (restore scenario definition) of the data specific to that Service by invoking the right associated methods.
BRIEF DESCRIPTION OF THE DRAWINGS
[0046] Features and advantages of the approaches discussed herein are evident from the text that follows and the accompanying drawings, where:
[0047] (from 0002)
[0048] Fig 1 is an overview of an example SaaS awareness implementation.
[0049] Fig 2 is an example of discovered SaaS services/applications.
[0050] Fig 3 is an overview of an example core data structure for SaaS-service/appli cation discovery.
[0051] Fig. 4 is a flow diagram for an automated discovery and data protection process.
[0052] Fig. 5 shows a model of a data processing environment that is augmented with a Data
Resilience Graph (“R-Graph”) and domain-specific viewer.
[0053] Fig. 6 is a typical data processing environment in more detail.
[0054] Fig. 7 is an example architecture of a complex application.
[0055] Fig. 8 is one example of R-Graph attributes.
[0056] Fig. 9 illustrates example attributes of R-Graph entities.
[0057] Fig. 10 is a table that describes the complete logic for propagation of the protected status in the environment.
[0058] Fig. 11 is a resulting display of compliance illustrating criticality.
[0059] Fig. 12 is an example high level flow for applying the propagation logic.
[0060] Fig. 13 shows the access points and data organization for an example Google Cloud
SQL SaaS application.
[0061] Fig. 14 shows the access points and data organization for a more complex SaaS application, such as DropBox.
[0062] Fig. 15 shows an R-Cloud Platform that manages a catalog that represents the data organization for a SaaS.
[0063] Fig. 16 is an example of how the R-Cloud Platform interacts with a data catalog.
[0064] Fig. 17 is a flow diagram of a method that may be used to implement data protection as described herein.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENT(S)
[0065] Leveraging Identity Provider Service for Discovery and Data Protection Implementation
[0066]
[0067] First Phase of Discovery
[0068]
[0069] Fig. 1 illustrates an example data processing environment 100 where a process for automated discovery of SaaS services and /or applications may be implemented. The environment 100 may be a typical enterprise such as a business, university, organization, or other group of individual users that access a set of SaaS services and/ or applications 130-1, 130-2, ..., 130-n. For example, the SaaS 130 may include sSalesForce, CloudSQL, DropBox and other SaaS services/applications. It should be understood however that other enterprises are different and that different or additional SaaS services/applications 130 may be deployed.
[0070] The enterprise utilizes an identity provider (IdP) service 120, such as one using Single Sign On (SSO) (like Okta or Azure AD (Entra)), to control access to the SaaS services and/or applications 130. SSO and similar IdP services 120 permit each user to use one set of login credentials — for example, a username and password to access multiple SaaS services/applications 130 and simplify the management of multiple login credentials.
[0071] In this example, the SaaS awareness function (referred herein to as the “R- Cloud Platform 110) accesses the IdP 120 to in turn access the SaaS 130-1, 130-2, ..., 130n in order to discover which SaaS services are in use by the enterprise.
[0072] The R-Cloud Platform 110 may include the following operations:
[0073] Remotely detecting if a SaaS service 130 is provisioned and running (SaaS service discovery).
[0074] Remotely detecting details about the SaaS service 130 (for example, the version of the service, the identification of logical entities within the SaaS services, respective data and metadata hosted within the service, and so on).
[0075] Remotely detecting storage consumption of the service 130 (if available)
[0076] Remotely detecting number of provisioned/subscribed users
[0077]
[0078] To provide SaaS awareness, an R-Cloud Module 140-1, 140-2, 140-n be specifically designed for each SaaS service 130-1, 130-2, ...,130n. R-Cloud Platform exposes a set of interfaces and types preferably enforces a common hierarchy and uniformity of SaaS-specific implementations within different R-Cloud Modules 140. For example, there may be an R-Cloud Module 140-1 for Salesforce, a different R-Cloud Module 140-2 for Dropbox, etc.). The internal implementation for each application probe can be done in any way that best suits the needs of the particular SaaS service.
[0079] The R-Cloud (SaaS) Module may be deployed as an application running within the enterprise or as an external service. The modules can thus be deployed in two modes:
[0080] Within the customer (e.g., enterprise) environment
[0081] Within a dedicated environment for each customer on backend hosted by the provider of the SaaS awareness service.
[0082] The first or discovery stage 112 of SaaS awareness therefore consists of remotely detecting if a SaaS service 130 is integrated within the customer environment. This can be done by leveraging and IdP 120 Single Sign On (SSO) service used by the enterprise. Example SSOs may include OKTA and Azure. On OKTA and Azure AD, this information can be retrieved through respective REST APIs.
[0083] Fig. 2 is an example map of resulting discovered SaaS services. If the SaaS service is not integrated within the customer’s SSO service, it can instead be added manually via a user interface to the list of discovered SaaS services.
[0084] The list of discovered services/applications may carry attributes such as a “name” associated with each service, , and a graphically indicated status of various features such as compliance of the service, such as protection and discovery. In this example, information reflected the discovered services is arranged in a tree referred to herein as an R-graph 200. The enterprise called HYCU that uses a mix of Software as a Service (Saas) resources and hosted resources. Here the Engineering department 202 uses Jira 203, Confluence 204, and GitHub 205 services they access as SaaS; the Finance Department 210 uses Navision 211 and Tipalti
212; Legal 220 uses Docusign 221 and a shared data repository 222; the Sales Department 230 uses SalesForce 232, and a couple of hosted resources (a Demo Data Center 234 and Demo Cloud 236), and Operations 240 does not yet have any managed resources.
[0085]
[0086] Second Phase of Discovery
[0087]
[0088] The second stage of SaaS awareness involves gathering SaaS-application-specific information. These attributes may be discovered during a LIST operation (implemented by each R- Cloud Module 140) on the respective SaaS -application 140.
[0089] The R-Cloud Platform 110 includes a service data management function that discovers service attributes, stores them, and then uses that information to drive backup and restore workflows and optional attributes.
[0090] Fig. 3 is an overview of an example core data structure that may be used by the R- Cloud Platform 110 to implement SaaS service/application discovery. In general, discovery of SaaS 140 is performed to determine if it has a corresponding backup method, restore method, configuration method, status method, and other information, such as lists of required attributes and optional attributes. The specifics of each method and list of attributes differs depending on the type of SaaS 140.
[0091] More particularly, the R-Cloud Platform includes an R-Cloud Manager 310 component, a Service Data Definition 320, and the R-Cloud Modules 140. Each R-Cloud Module 140 is programmed to access its associated SaaS application 130 such as through an Application Programming Interface (API) 325. There is a different API 325 for each SaaS 130. [0092] The Service Data Definition consists of resource objects 340 which correspond to the attributes of a corresponding discovered SaaS application 130. These attributes may be discovered such as during a LIST operation on the SaaS-application. Each such LIST operation may return a
[0093] list that describes certain aspects of the structure of the SaaS application. The structure may identify a list of required attributes that the R-Cloud platform 110 will then use to
drive backup and restore methods, as well as an optional list of attributes that are meaningful only to the module.
[0094] As shown in Fig. 3, these discovered attributes 350 may include values for an identifier 351, name 352, and type 353 of the SaaS 130. Also included are attributes such as whether or not it has other related dependent or subservient services 354, provides its own backup method 355, defines a backup sequence 356, or defines a restore sequence 357. Still other attribbutes may include whether it can display metadata 358, its location 359, and other metrics 360.
[0095] As a result, the R-Cloud Platform has now discovered the extent of the default data protection built into a Service. This allows the end users to determine what is already present in the system and what additional data protection they may desire.
[0096] An example of a required attribute is the »canBackup« attribute which indicates to the R-Cloud platform 110 that a SaaS implements a backup method.
[0097] Example optional attributes may further define the »canBackup« attribute to specify one or more levels of a hierarchical resource the backup protection is or is not implemented. For an example CloudSQL SaaS, the »hasSubResources« can be set to True. The child resources may be further defined as optional attributes, such as a list of cloud SQL servers, a list of of SQL instances are running on each server, a list of databases running on each SQL instance, and a list of tables in each database. The optional attributes may further specify a
[0098] »canBackup« attribute for each object in the list, such that it can be determined whether each server, instance, database, and table can or cannot be backed up at its corresponding level.
[0099] Similarly, optional child attributes of a DropBox SaaS may include a file structure hierarchy including top level personal / public / shared folders, a subfolder under each such top level folder, and then files within each subfolder. The optional attributes may thus specify whether this particular DropBox resource can be backed up, or not, at each level of the top level/subfolder/file hierarchy.
[00100] The child attributes therefore enable adaption of the discovery service to be customized to different use cases.
[00101] The configuration method may include configuration options and UI attributes (such as access credentials) for implementing the actual backup and resource methods.
[00102] Fig. 4 is a high level flow chart for an example automated service discovery and data protection process that may be implemented within the system above.
[00103] In a first step 402, access is provided to an authentication service.
[00104] In a next step 404, that authentication service is queried to automatically discover the services that have been provisioned.
[00105] In step 406, further details about the data configuration for each service is discovered. As explained above, this may include a service version, identification of logical entities within the service, or respective data and metadata being hosted within the service.
[00106] Further options regarding the service can also be retrieved in step 408. As explained above, this may include things such as storage consumption, the number of subscribed users, and other attributes that may assist with data protection.
[00107] The remaining steps are typically carried out by an R-cloud module that is specifically designed for each service.
[00108] As explained above, an R- cloud module for a SQL database service will perform different functions from an R- cloud module for a Dropbox service.
[00109] At this point, such as at step 410, an appropriate R- cloud plugin for each service is invoked to discover service-specific attributes.
[00110] As explained above this second stage of discovery determines, for example in step 412, the data protection attributes of each service, such as whether the service has a »can_backup« attribute.
[00111] Attributes of the users configuration of each service are then discovered in step 414. As explained for the examples above, if the service is a SQL database service then information regarding parent-child databases can be retrieved, whether each database can be backed up or restored, or to what extent backup and restore operations can be handled by the service.
[00112] As a final step 416 for each service, the appropriate data protection primitives are enabled - either as made available by the service or as separately configured for the enterprise.
[00113]
[00114] Data Resilience Graph (R-Graph) Representation of Protection and Compliance [00115]
[00116] Fig. 5 shows how a model 5-100 of a data processing environment 5-102 can be augmented with a Data Resilience Graph (“R-Graph”) 5-104 and a domain-specific viewer 5- 106. A typical enterprise’s data processing environment 5-102 may consist of a wide variety of resources such as hosted applications, services, public/private clouds, servers, storage, databases, processors, and many other types of data processing elements. A model 5-100 of these resources may be developed and maintained in different ways such as via Simple Network Management Protocol (SNMP), Common Information Model (CIM), or other methods that define how the managed resources in an IT environment 5-102 are represented as a common set of objects and relationships between them.
[00117] The Model 5-100 may include the ability to collect the compliance and protection status of the managed resources. As is known in the art, this status information may be automatically discovered via agents, plug-ins, via Application Programming Interfaces (APIs) and the like installed in the managed resources. However this information may also be collected in other ways, such as manually. The collected status information is then stored in a Data Resilience Graph (“R-Graph”) data structure 5-5-104, an example of which is discussed in more detail below.
[00118] The R-Graph 5-104 may be augmented with criticality information 5-108 that is further processed by rules we call propagation logic 5-110. The criticality 5-108 for a particular resource may differ depending on the perspective of different domains, such as departments or functions, within the enterprise. Thus a given resource may have different criticality values for different domains.
[00119] A viewer application 5-106 then provides a display of one or more aspects of the R- Graph to an IT manager or other user after application of the propagation logic 5-110. The display generated by the viewer 5-106 is tailored to the specific domain of interest. For example, the user may only be interested in the compliance status from the perspective of a particular department in an enterprise. A typical modem company has dozens of complex business applications and services which are at the heart of the business availability status. Some of them
are critical to the operation of the business, some of them have a standard importance, and some of them have almost no impact on the core business. Thus, a resource considered critical by one department may not be critical for another. Or perhaps upper level management prefers to know the status of the organization as a whole, and considers all of the systems that support a given department (such as sales or manufacturing) more critical than another department’s systems (such as engineering).
[00120] Fig. 6 shows an example R-graph 6-200 for an example enterprise 201 called HYCU that uses a mix of Software as a Service (Saas) resources and hosted resources.
[00121] In this example, the Engineering department 6-202 uses Jira 6-203, Confluence 6- 204, and GitHub 6-205 services they access as SaaS; the Finance Department 6-210 uses Navision 6-211 and Tipalti 6-212; Legal 6-220 uses Docusign 6-221 and a shared data repository
6-222; the Sales Department 6-230 uses SalesForce 6-232, and a couple of hosted resources (a Demo Data Center 6-234 and Demo Cloud 6-236), and Operations 6-240 does not yet have any managed resources. The user’s mouse is hovering over the Tipalti 6-212 resource and can see it was last backed up on 30 August. The checkmarks next to the different resources indicate their compliance status; an “x” indicates a resource that is not in compliance (e.g., Navision). A “shield” next to a resource may indicate the data protection status for the node, and a “dot” next to a resource may indicate its compliance status.
[00122] Fig 7 is an example architecture of a complex hosted application 7-300. It consists of a set of resources including pair of application servers 7-302-1, 7-302-2, a pair of load balancers
7-304-1, 7-304-2, a master database 7-306, and a pair of replica databases 7-308-1, 7-308-2.
[00123] The ability to understand the health, protection status and compliance of the different SaaS services in use as depicted in the example enterprise of Fig. 6 as well as a hosted application (as shown in the example of Fig. 7) requires an algorithm we call propagation logic herein that takes into consideration:
[00124] data protection status
[00125] compliance SLOs
[00126] criticality
[00127] To this end, the Model 5-100 collects this data protection, compliance, and criticality information for each resource. To obtain a result for whether the hosted application of Fig. 3 as whole is protected, the SLO may specify that it is sufficient if only one of the replicas is protected. Thus the resulting R-Graph propagation logic 5-110 may determine that the application is protected if Replica 1 OR Replica 2 is protected. However in an other enterprise, the SLO may specify the need for both Replica 1 AND Replica 2 to be protected for the application as a whole to be considered to be protected. And the SLO may consider the Replicas to be considered “critical”, but still specify that Replica 1 “OR” Replica 2 is sufficient since they are replicas and not the Master Database.
[00128] Every organization’s protectable data processing resources (entities) can thus be described by tracking the entities shown in the example table 8-400 of Fig. 8 in an R-Graph 5- 104. The type of source object can specified as various types (e g., virtual machine (VM) fileserver, container, SaaS service, application, etc.). Groups of resources can also be defined and the groups can be nested. In this example, a “VMs” group consists of the two virtual machines (vml, vm2) and an “application 1” group consists of the VMs group, a SalesForce SaaS and a CloudSQL service.
[00129] In this way any business application, service or data can be described environment, including even the overall organization(with a group that groups everything at a top level). [00130] Fig. 9 illustrates a table 9-500 that is but one example of the possible attributes of R- Graph entities in more detail. Here a source object may have further attributes other than just a type, such as
[00131] RPO (Recovery Point Objective)
[00132] RTO (Recovery Time Objective)
[00133] Retention (Retention Policy)
[00134] Protected Status = [Protected, Protected-With-Warnings, Not-Protected]
[00135] Compliance Status
[00136] Criticality Status = [Critical, Standard, Excluded]
[00137] The R-Graph may also implement certain rules for propagation of their individual protected and compliance status of "child” members of a larger hierarchy or group.
[00138] For example, rules for “Propagation of Protected Status” may include:
[00139] All children with Criticality=’ Critical’ must be protected to have the group status protected.
[00140] At least one child with Criticality=’ Standard’ must be protected in order to propagate the group status as protected.
[00141] Protected status of a child with Criticality = ‘Excluded’ does not affect the status of the group.
[00142] Fig. 10 is an example of a table 10-600 that describes the propagation logic 5-104 for propagation of protected status, depending on the criticality of the resource.
[00143] More particularly, example rules for propagation of compliant status may include:
[00144] All children with Criticality=’ Critical’ must have ‘Compliant’ status in order to propagate ‘compliant’ to the group.
[00145] All children with Criticality=’ Standard’ and of type source group must have ‘Compliant’ status in order to propagate ‘compliant’ to the group.
[00146] At least one child with Criticality=’ Standard’ and of type source_object must have ‘Compliant’ status in order to propagate ‘compliant’ to the group.
[00147] Children with Proctected=’Not Protected’ should propagate to ‘not-protected’ as should any with Protected Wamings=’Not Protected’.
[00148] The rules above describe examples of the complete logic for propagation of compliant status in the environment. Note that entries with Criticality=’Excluded’ are not affecting the score.
[00149] It should be understood that additional or other compliance rules are possible. For example, Criticality may include Critical (Protected) and Standard (Not Protected) and a Propagation rule may include Protected with Warnings.
[00150] In addition, it is possible that the logic may include other conditions. For eample, the user may be able to set criteria (such as equal to or greater than 50% of child resources) to trigger non-compliance or a warning to a higher level.
[00151] Fig. 11 is a resulting R-Graph 11-700 that shows how Criticality and Compliance propagate for a particular organization’s resources. The rectangles represent managed resources,
each of which may represent individual or groupings of applications or services, or a department’ s managed resources, or groups of applications. In the event a color version of this drawing is available, criticality is indicated by the color of each rectangle, with blue indicating standard criticality and orange indicating critical. Otherwise the color of each rectangle is indicated in the words next to it. The view may be of a particular department’s resources or the entire enterprise.
[00152] In this example, at “Level 1” 11-701, a “critical/compliant” resource 11-710 (shown with an orange rectangle labeled “C”) and a “standard/non-compliant" resource 11-711 (shown with a blue rectangle labeled "N”) propagates to a “critical/non-compliant" resource 711-12 (an orange rectangle labled “N”). At “Level 2” 11-702, a pair of “critical/compliant” resources 11- 720, 11-721 propagate to “critical/compliant” resource 11-722. “Level 3” 11-703 is a department that has a standard/compliant resource 11-730 and a standard/non-compliant resource 11-731 but which propagates to a “critical/compliant” resource 11-732 because of the nature of the department that Level 3 supports (perhaps it is the Sales department).
[00153] Thus someone viewing the graph at Level 2 would conclude that department’s managed resources are “critical and compliant”. However, someone viewing the R-Graph at Level 1 would conclude that department’s resources are “critical and non-compliant”, potentially exposing that immediate action may be needed.
[00154] The propagation logic 5-104 similarly processes the other Levels 11-704, 11 -1 1 -705, 11-706, 11-707 to generate the resulting overall view 11-700.
[00155] The overall view 11-700 of the propagated statuses of the entire organization exposes an overall non-compliant/critical status 11-760 , which may prompt some immediate action.
[00156] “Protection” and “compliance” propagation may not be absolute (e.g., they need not always resolve to a “yes” and “no” answer. For example, different weightings or scales may be configured at different levels.
[00157] A process for generating an R-Graph representation of the Criticality and Compliance status of a particular enterprise’s data resources can now be appreciated. One such example process is shown in Fig. 12.
[00158] In a first step 12-802, a model of the data processing resources in an enterprise is generated.
[00159] In step 12-804, an R-Graph is generated from the model as explained above.
[00160] Starting at step 12-806, this involves, for each object in the R-Graph, determining a criticality attribute (step 12-808) and a compliance attribute (step 12-810).
[00161] In step 12-812, other attributes for the object(s) may also be determined.
[00162] In step 12-814 these attributes are recorded for the object, and processing returns to step 12-806 until all objects are processed.
[00163] At some later point in time a user wishes to access the R-Graph in step 12-820.
[00164] The propagation logic is applied to the R-Graph in step 12-822.
[00165] The resulting propagated attributes are then displayed in step 12-824.
[00166]
[00167] API Model for Managing Compliance Status
[00168]
[00169] An example implementation of a method for recovering a wide range of services, such as a SaaS application, that supports different levels of granularity are now described. It should be understood, however, that the same general approach can be applied to provide a structured way to describe any service (e.g., SaaS, PaaS, DBaaS, laaS) related data and configuration and the associated data resilience management methods. The approach enables the ability to protect, and most importantly recover, the data in the service and associated configurations with minimal complexity.
[00170] Data organization within a typical simple SaaS service
[00171] From the perspective of data protection, in one embodiment a typical SaaS service operates in a computing environment that deploys one or more access points to users via an Application Programming Interface (API). The access points consist of various resources behind which data is organized in a hierarchical fashion.
[00172] Fig. 13 shows the access points 13-100 and data organization 13-110 for an example Google Cloud SQL SaaS application 13-120. This example has access points that include a relational database instance 13-102 that includes three relational databases (Database #1 13-104-
1, Database #2 13-104-2, and Database #3 13-104-3); each database 13-104 may be a separate access point. The databases 13-104 are each further organized into sets of tables 13-106, with each database having a certain number of tables. For example, database 13-104-1 has tables 13- 106-1-1 and 13-106-1-2. The databases 13-104 do not all have the same number of tables. [00173] Each table 13-106 may include structured rows of billions of data objects. Looking at this from a data protection perspective, granular recovery is not required by this particular user (that is, recovery of individual specific tables), because of complex relationships between the data in the rows/columns. Recovery of the entire database is sufficient.
[00174] In this example, based on the data organization attributes 13-110, the sufficient granularity requirements for data protection are:
[00175] backup of the whole service or a selected database
[00176] recovery of the whole service or a selected database
[00177]
[00178] Example of Complex Data Organization within a SaaS Service
[00179]
[00180] Fig. 14 shows the access points 14-200 and data organization 14-210 for a more complex SaaS application 14-220, such as DropBox.
[00181] This DropBox application 14-220 hosts data for several different Projects 14-201-1, 14-201-2, 14-201-3, 14-201-4, 14-201 -5; the data for each project 14-201 is further organized in folders 14-203 and fdes 14-205 (or other objects such as databases 14-202) in a hierarchical directory. The files 14-205 are typically organized such that there is a folder 14-203 for each project 14-201, and each project 14-201 in turn has data stored in many subfolders. Each subfolder may contain files, databases, or other objects related to some specific aspect of its associated project (here project 14-201-1 is indicated as the individual databases of .PNG files). The data service points thus also correspond to different levels of this hierarchy, such the different Projects (folders) and subfolders. Granular recovery, say of each project or even of each individual file, is important in this particular access point instance for this particular user. [00182] In this case the data organization 14-210 indicates requirements for data protection include:
[00183] backup of the whole service or only a selected project
[00184] recovery of the whole service or only a selected project
[00185] recovery of only a specific single item/object within the service
[00186] To protect this SaaS access point 14-200 appropriately, a metadata catalog is maintained that reflects the one or more levels at which recovery is needed.
[00187]
[00188] Split Catalog of protected items in order to provide granular recovery for any SaaS service
[00189] As shown in Fig. 15, a platform called the R-Cloud Platform 15-310 manages a catalog 15-320 that represents the data organization 13-110, 14-210 for each SaaS 15-340-1, 15- 340-2, 15-340-3 in use by the enterprise. Note that the R-Cloud platform may be accessed by a user 15-350 via an API 15-360; in addition, each SaaS provides the aforementioned access points via their own respective APIs 15-335-1, 15-335-2, 15-335-3. The catalog 15-320 stores metadata regarding the organization and attributes of the data objects within one or more SaaS services operated by an enterprise. The catalog 320 is maintained by the R-Cloud platform 15- 310 in cooperation with a number of plug-in modules 15-330. There is, for example an R-Cloud plug-in module 330 for each SaaS 15-340.
[00190] Although an R-Cloud module 15-330 is provided for each SaaS application 15-15- 340, the end user 15-350 need only interact with the R-Cloud platform 15-310 such as via the API 360. Thus, in the preferred approach, the developer of the R-Cloud plug-in module 330 for a given SaaS can determine whether the SaaS itself provides a recovery method, and at which level of granularity those recovery method(s) are provided. This information can then be recorded in the metadata catalog 15-320 via the R-Cloud plug-in modules 330.
[00191] More particularly, each item in the catalog, be it a set of data objects (such as database) or even a single data object (such as an individual file) has an associated data protection attribute. In some examples, the data protection attribute may be a »canBackup« attribute that indicates whether the object can be independently backed up such as by accessing the APIs of the respective SaaS service 340. The R-Cloud Modules 330 thus access the catalog 320 to determine how to implement data protection for each object.
[00192] For example, a data object that can be backed up by the service itself (such as a database in a CloudSQL application) may have its »canBackup« attribute set to True.
[00193] In case of a simple SaaS such as Cloud SQL 15-340-1, each object such as a database may its »canBackup« property set to True. This means that the R-Cloud Platform 15-310 can invoke the R-Cloud plug-in 15-330-1 and then know to call concurrently the SaaS specific backup method via the Cloud SQL 15-340-1 API 15-335-1. This frees the end user from having to know the backup or other data protection capabilities of each specific SaaS.
[00194] However, in the case of a SaaS application that hosts millions of objects (for example Dropbox 15-340-2, SalesForce 15-340-3, etc. . .), and where per-object granularity is required, the R-Cloud Module may return False for the attribute »canBackup«.
[00195] Thus, for a simple SaaS, the metadata catalog may have only a single attribute. However, for a more complex SaaS, the metadata catalog may have many attributes that are exposed via API 15-360 to the end user / developer 15-350. For example, the R-Cloud platform 15-310 may enable end users / developers 15-350 to browse the catalog 15-320 via the R-Cloud Modules 15-320 by invoking a List operation to discover the backupable items. The end user/developer 15-350 may then further augment the metadata in the catalog 15-320 with information that describes a backup workflow that is outside the SaaS APIs themselves.
[00196] If we refer back to Fig. 14 (Description of Complex Data Organization within a SaaS Service such as DropBox 15-340-2):
[00197] Up to the Project 14-201 level the user 15-350 is able to define the backup granularity.
[00198] Information (metadata or attributes) about the items/data contained within each Project 14-201 at the backup time will be stored in the catalog 15-320 by the R-Cloud Module 15-330 together with backup data during the backup operation. All of the items below the Project level 14-201 (such as folders 14-203 and files 14-205) have their »canBackup« attribute set to False, meaning that they cannot be separately selected/unselected for backup.
[00199] Within the R-Cloud platform’s 15-310 API 15-360, the user 15-350 can select a specific project 201 (and its backup version) and invoke a browse option (which operates a list method of the R-Cloud Module 15-330-2 for the SaaS 15-340-2). This enables the user to
discover which items were also actually backed up when respective Project was backed up by the SaaS.
[00200] After browsing and selecting low level items, an available data restore workflow operation (outside of the SaaS 340-2 itself) may individually specified by the user via the API 15-360 and recorded in the catalog 15-320. This workflow can then be invoked by the R-Cloud platform 15-310 via the R-Cloud Plug-in module 15-330-2 when recovery is requested. Other aspects of recovery, such as data dependencies (recovery order) may also be recorded as metadata in the catalog 15-320.
[00201] It can now be understood that with this approach, the SaaS-specific functions are implemented in the R-Cloud Plug-ins 15-330, freeing the end user 15-350 from having to understand the specifics of whether each SaaS 15-340 implements a data protection schem and to what extent.
[00202] The system can now invoke an automatic recovery workflow and be assured that the appropriate method for each SaaS will be invoked, regardless of the structure of data objects. The back up may be entirely done by the respective SaaS (such in the CloudSQL 340-1 example above), or completely managed by the R-Cloud plugin module 15-330-2 discovering the appropriate methods (such as in the DropBox 15-340-2 example), or some mix of the two. The R-Cloud plug ins 15-330 use the catalog 15-320 to understand the specifics of each SaaS service's 340 backup abilities, freeing the end user 15-350 from having to know these details.
[00203] Fig. 16 is a more detailed depiction of the R-Cloud Platform 15-310 and how the API 15-360 can be leveraged by the user 15-350. In general, discovery of each SaaS is performed to determine if it has a corresponding data protection method and its attributes, such as a backup method, restore method, configuration method, or status method. Other information, such as lists of required attributes and optional attributes is also collected. The specifics of each method and list of attributes differs depending on the type of SaaS application 340.
[00204] More particularly, the R-Cloud Platform includes an R-Cloud Manager 16-410 component, a Service Data Definition 16-420, and Service Data Management 16-430, and the R- Cloud Modules 15-330. Each R-Cloud Module 15-330 is programmed to access its associated
SaaS application 15-340 such as through an Application Programming Interface (API) 15-335. There is a different API 15-335 for each SaaS application 15-340.
[00205] The Service Data Definition 16-420 consists methods that include an authentication method 16-422 and a discovery method 16-424. These methods are used to discover attributes of a SaaS 340, resource, such as during a LIST operation. Each such LIST operation may return a [00206] list that describes certain aspects of the structure of the SaaS application. The structure may identify a list of required attributes that the R-Cloud platform 15-310 will then use to drive backup and restore methods, as well as an optional list of other attributes.
[00207] Service Data Management 16-430 may include methods for defining backup options 16-432, backup execution 16-434, defining recovery services 16-436 and recovery execution 16- 438.
[00208] As shown in Fig. 16, the attributes 15-370 discovered for each SaaS may include values for an identifier, name, and SaaS type. Also discovered may be attributes such as whether or not the SaaS has other related dependent or subservient data objects, provides its own backup method, defines a backup sequence, or defines a restore sequence. Still other attributes may include whether the SaaS can display metadata, its location, and other metrics. [00209] An example of a discovered attribute for a SaaS is its »canBackup« attribute. This indicates to the R-Cloud platform 14-210 that the SaaS implements a native backup method. [00210] Example optional attributes may further define the »canBackup« attribute to specify, at one or more levels of a data hierarchicy, whether backup protection is available. For an example CloudSQL SaaS, the »hasSubResources« can be set to True. The child resources may be further defined as optional attributes, such as a list of cloud SQL servers, a list of of SQL instances are running on each server, a list of databases running on each SQL instance, and a list of tables in each database. The optional attributes may further specify a »canBackup« attribute for each object in the list, such that it can be determined whether each server, instance, database, and table can or cannot be backed up at its corresponding level.
[00211] In the illustrated example of a DropBox SaaS, the catalog indicates that the data objects 380 include a file structure that has a root (top level) folder that has SubResources. A
resource A itself is a folder that SubResource C. Resource C does not have any child resource. The hasSubResource for object B also indicates that it does not have any child resources.
[00212]
[00213] Example for Cloud SQL
[00214]
[00215] In case of a SaaS resource such as Cloud SQL, the catalog entries for a particular server may have a »canBackup« property set to True and other child properties (such as for a tables level) set to False. This means that at backup time the R-Cloud platform will know to call concurrently the r-cloud module for each backup-enabled resource as:
[00216]
[00217] backup(instance, options) backup(database-l, options)
[00218] ...
[00219] backup(database-n, options)
[00220]
[00221] In a case where certain operations need to occur in a certain order a developer can use other attributes, such as a backupSeqGroup and/or restoreSeqGroup attributes to control the order of operations. In case of Cloud SQL, the instance resource type may have restoreSeqGroup set to 1 and all database resources will have restoreSeqGroup set to 2. This means that the R- Cloud platform will execute:
[00222]
[00223] restore(instance, options)
[00224]
[00225] and wait for this instance of the restore operation to finish before then concurrently executing the database restore operations:
[00226] restore(database-l, options)
[00227] ...
[00228] restore(database-n, options)
[00229]
[00230] Fig. 17 is a high level flow diagram of a method that may be used to implement aspects of this data protection scheme using the system and features as described above.
[00231] In a first step 17-502, a set of services are identified.
[00232] Next in step 17-504, access points for each given service are determined.
[00233] In step 17-506, for each access point, data objects are determined. For example, the data objects in the case of a SQL service may include one or more databases. However, for a Dropbox service, the granularity of the data objects may be projects, folders and files. Other services may have other types of data objects.
[00234] Next, in step 17-508, a data protection attribute for each data object is determined. As explained above, this may include a »can Backup« attribute for that data object.
[00235] This information is then stored in a catalog in the next step 17-510.
[00236] At 17-512, this process flow continues for all objects and access points in all services.
[00237] Step 17-520 represents some later time at which a user may browse the catalog and in step 522 select an access point or an object and review or change its available data protection status.
[00238] At step 17-530, which is some later time still, a recovery workflow is invoked. The catalog may thus be accessed at step 17-532 to discover data protection schemes in use and then instantiated at step 17-534. Note that this catalog is configured and maintained outside of SaaS itself, even if a given service itself provides protection.
[00239]
[00240]
[00241] Further Implementation Options
[00242] It should be understood that the workflow of the example embodiments described above may be implemented in many different ways. In some instances, the various “data processors” may each be implemented by a physical or virtual or cloud-based general purpose computer having a central processor, memory, disk or other mass storage, communication interface(s), input/output (I/O) device(s), and other peripherals. The general-purpose computer is transformed into the processors and executes the processes described above, for example, by loading software instructions into the processor, and then causing execution of the instructions to
carry out the functions described.
[00243] As is known in the art, such a computer may contain a system bus, where a bus is a set of hardware lines used for data transfer among the components of a computer or processing system. The bus or busses are essentially shared conduit(s) that connect different elements of the computer system (e.g., one or more central processing units, disks, various memories, input/output ports, network ports, etc.) that enables the transfer of information between the elements. One or more central processor units are attached to the system bus and provide for the execution of computer instructions. Also attached to system bus are typically I/O device interfaces for connecting the disks, memories, and various input and output devices. Network interface(s) allow connections to various other devices attached to a network. One or more memories provide volatile and/or non-volatile storage for computer software instructions and data used to implement an embodiment. Disks or other mass storage provides non-volatile storage for computer software instructions and data used to implement, for example, the various procedures described herein.
[00244] Embodiments may therefore typically be implemented in hardware, custom designed semiconductor logic, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), firmware, software, or any combination thereof.
[00245] In certain embodiments, the procedures, devices, and processes described herein are a computer program product, including a computer readable medium (e.g., a removable storage medium such as one or more DVD-ROM's, CD-ROM's, diskettes, tapes, etc.) that provides at least a portion of the software instructions for the system. Such a computer program product can be installed by any suitable software installation procedure, as is well known in the art. In another embodiment, at least a portion of the software instructions may also be downloaded over a cable, communication and/or wireless connection.
[00246] Embodiments may also be implemented as instructions stored on a non-transient machine-readable medium, which may be read and executed by one or more procedures. A nontransient machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a nontransient machine-readable medium may include read only memory (ROM); random access
memory (RAM); storage including magnetic disk storage media; optical storage media; flash memory devices; and others.
[00247] Furthermore, firmware, software, routines, or instructions may be described herein as performing certain actions and/or functions. However, it should be appreciated that such descriptions contained herein are merely for convenience and that such actions in fact result from computing devices, processors, controllers, or other devices executing the firmware, software, routines, instructions, etc.
[00248] It also should be understood that the block and system diagrams may include more or fewer elements, be arranged differently, or be represented differently. But it further should be understood that certain implementations may dictate the block and network diagrams and the number of block and network diagrams illustrating the execution of the embodiments be implemented in a particular way.
[00249] Embodiments may also leverage cloud data processing services such as Amazon Web Services, Google Cloud Platform, and similar tools.
[00250] Accordingly, further embodiments may also be implemented in a variety of computer architectures, physical, virtual, cloud computers, and/or some combination thereof, and thus the computer systems described herein are intended for purposes of illustration only and not as a limitation of the embodiments.
[00251] The above description has particularly shown and described example embodiments. However, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the legal scope of this patent as encompassed by the appended claims.
Claims
1. An automatic Software as a Service (SaaS) resource discovery method comprising: accessing an identity authentication service to obtain access credentials for the SaaS resource; accessing the SaaS resource using the access credentials; and discovering data protection attributes specific to the SaaS resource.
2. The method of claim 1 wherein the method is triggered either as a scheduled job or when a new SaaS is integrated with the identity service or when initiated by a user or a service.
3. The method of claim 1 where the identity service is a Single Sign On (SSO) service.
4. The method of claim 1 wherein the data protection attributes comprise information specific to the SaaS resource includes one or more of a backup method, the default data protection built into the service, resource method, status method, or configuration method.
5. The method of claim 1 additionally wherein the data protection attributes for the resource including one or more of default data protection, canBackup, hasChildResources, backupSeqGroup, or restoreSeqGroup.
6. The method of claim 5 wherein the canBackup attribute indicates one or more child levels of the resource for which backup is implemented at each level.
7. A method for assessing data resilience status of a data processing environment comprising: generating a model of data processing resources within the data processing environment; generating a data resilience-graph (R-graph) based the model, the R-graph
including an object for each resource, the object for each resource further including at least a compliance attribute and a criticality attribute; and applying compliance and criticality rules to the objects in the R-graph; and displaying a domain-specific view of the R-graph.
8. The method of claim 7 additionally wherein the R-graph consists of a hierarchy of objects, and the domain-specific view is further generated by applying inheritance attributes of the compliance and criticality rules to the hierarchy of objects.
9. The method of claim 8 wherein the domain-specific view is for an entire enterprise, a department within the enterprise, an application, or a service.
10. A method for automatic data protection for one or more resources provided by a service, wherein the service operates in a computing environment that deploys one or more access points to which Application Programming Interface (API)-based requests are directed, and wherein the service resources relate to data objects arranged at one or more levels of a hierarchy, the method comprising: responsive to receipt of an API request for data protection, discovering data objects accessed by the service resource, discovering attributes specific to the data objects, including a data protection attribute that indicates whether a data protection method is accessible to protect the data objects via the service resource at one or more levels of a hierarchy; obtaining information for use with another data protection method that is other than via the service resource; and executing a granular data protection process, by accessing the data protection attribute information for each data object, and when the data protection attribute is true, invoking the data protection method accessible via the service resource;
else when the data protection attribute is false, invoking the other data protection backup method.
11. The method of claim 10 wherein discovering attributes further comprises determining a workflow for invoking a data protection method that is other than via the service resource.
12. The method of claim 10 wherein the step of discovering attributes is implemented in a plugin operating within the service.
13. The method of claim 10 wherein the step of executing the granular data protection process is implemented on a data processing platform outside of the service resource.
14. The method of claim 10 wherein the service is a SaaS, PaaS, DBaaS, or laaS.
15. The method of claim 10 wherein the data protection attribute indicates whether the service provides backup for the data object.
16. The method of claim 10 wherein the data protection attribute indicates whether the service provides recovery of the data object.
17. An apparatus for assessing compliance status of a data processing environment comprising: one or more data processors; and one or more computer readable media including instructions that, when executed by the one or more data processors, cause the one or more data processors to perform a process for: generating a model of data processing resources within the environment; generating a data resilience-graph (R-graph) based the model, the R-graph including an object for each resource, the object for each resource further including at least a compliance attribute and a criticality attribute;
applying compliance and criticality rules to the R-graph; and displaying a domain-specific view from the R-graph.
18. The apparatus of claim 17 additionally wherein the R-graph consists of a hierarchy of objects, and the domain-specific view is further generated by applying inheritance attributes of the compliance and criticality rules to the hierarchy of objects.
Applications Claiming Priority (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363442138P | 2023-01-31 | 2023-01-31 | |
| US202363442139P | 2023-01-31 | 2023-01-31 | |
| US202363442140P | 2023-01-31 | 2023-01-31 | |
| US63/442,140 | 2023-01-31 | ||
| US63/442,138 | 2023-01-31 | ||
| US63/442,139 | 2023-01-31 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024163394A1 true WO2024163394A1 (en) | 2024-08-08 |
Family
ID=92147535
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/013443 Pending WO2024163394A1 (en) | 2023-01-31 | 2024-01-30 | Enabling service data protection through discovery, r-graph propagation andapi model |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024163394A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190036910A1 (en) * | 2015-03-16 | 2019-01-31 | Convida Wireless, Llc | End-to-end authentication at the service layer using public keying mechanisms |
| US20190332485A1 (en) * | 2016-12-15 | 2019-10-31 | Nutanix, Inc. | Rule-based data protection |
| WO2020106973A1 (en) * | 2018-11-21 | 2020-05-28 | Araali Networks, Inc. | Systems and methods for securing a workload |
| US20220318059A1 (en) * | 2021-03-31 | 2022-10-06 | Amazon Technologies, Inc. | Distributed decomposition of string-automated reasoning using predicates |
-
2024
- 2024-01-30 WO PCT/US2024/013443 patent/WO2024163394A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190036910A1 (en) * | 2015-03-16 | 2019-01-31 | Convida Wireless, Llc | End-to-end authentication at the service layer using public keying mechanisms |
| US20190332485A1 (en) * | 2016-12-15 | 2019-10-31 | Nutanix, Inc. | Rule-based data protection |
| WO2020106973A1 (en) * | 2018-11-21 | 2020-05-28 | Araali Networks, Inc. | Systems and methods for securing a workload |
| US20220318059A1 (en) * | 2021-03-31 | 2022-10-06 | Amazon Technologies, Inc. | Distributed decomposition of string-automated reasoning using predicates |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11010487B2 (en) | Multi-tenant authorization framework in a data management and storage cluster | |
| US20200067791A1 (en) | Client account versioning metadata manager for cloud computing environments | |
| US8856077B1 (en) | Account cloning service for cloud computing environments | |
| US9565260B2 (en) | Account state simulation service for cloud computing environments | |
| US10120703B2 (en) | Executing commands within virtual machine instances | |
| US8805971B1 (en) | Client-specified schema extensions in cloud computing environments | |
| US9210178B1 (en) | Mixed-mode authorization metadata manager for cloud computing environments | |
| US9760420B1 (en) | Fleet host rebuild service implementing vetting, diagnostics, and provisioning pools | |
| US10009225B2 (en) | System and method for supporting multiple partition edit sessions in a multitenant application server environment | |
| US9253055B2 (en) | Transparently enforcing policies in hadoop-style processing infrastructures | |
| US20180081795A1 (en) | Automated test generation for multi-interface enterprise virtualization management environment | |
| US20140237070A1 (en) | Network-attached storage management in a cloud environment | |
| CN114586032B (en) | Secure workload configuration | |
| CN104769911A (en) | Multi-Domain Identity Management System | |
| US9361595B2 (en) | On-demand cloud service management | |
| US10355922B1 (en) | Automated computing architecture configuration service | |
| US10038645B2 (en) | System and method for portable partitions in a multitenant application server environment | |
| US11113186B1 (en) | Testing and publishing of resource handlers in a cloud environment | |
| US10572373B2 (en) | Automated test generation for multi-interface and multi-platform enterprise virtualization management environment | |
| US11481268B2 (en) | Blockchain management of provisioning failures | |
| US11799963B1 (en) | Method and system for identifying user behavior based on metadata | |
| US10394591B2 (en) | Sanitizing virtualized composite services | |
| US11556332B2 (en) | Application updating in a computing environment using a function deployment component | |
| WO2024007733A1 (en) | Protecting container images and runtime data | |
| US20240248762A1 (en) | Method and system for identifying a holistic profile of a user based on metadata |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24750817 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |