WO2025207082A1 - Systems and methods for an automatic differential privacy policy - Google Patents
Systems and methods for an automatic differential privacy policyInfo
- Publication number
- WO2025207082A1 WO2025207082A1 PCT/US2024/021497 US2024021497W WO2025207082A1 WO 2025207082 A1 WO2025207082 A1 WO 2025207082A1 US 2024021497 W US2024021497 W US 2024021497W WO 2025207082 A1 WO2025207082 A1 WO 2025207082A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- logging data
- differential privacy
- client devices
- configuration file
- privacy parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/02—Protecting privacy or anonymity, e.g. protecting personally identifiable information [PII]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0407—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden
Definitions
- a computer-implemented method includes receiving, from a plurality of client devices, logging data associated with a software application, wherein the software application is installed in the plurality of client devices, and wherein the logging data is based on an initial differential privacy parameter.
- the method further includes adjusting the initial differential privacy parameter based on an analysis of the received logging data.
- the method additionally includes generating a configuration file comprising the adjusted differential privacy parameter.
- the method further includes sending the configuration file to the plurality of client devices, wherein the sending of the configuration file causes the plurality of client devices to collect modified logging data.
- Another advantage of an automated approach is that it enables automatic updates for different features at different times. Additionally, collection of logging data can impact compute resources and may negatively impact the running of software applications. Accordingly, tailoring a collection and/or transmission of logging data to an intended use can result in improvements in battery performance, use of memory resources (e.g., for storing logging data, and/or updating newer versions of an app to change a differential privacy policy, etc.), use of network resources (e.g., for transmitting logging data), among others. Enhancing the actual and/or perceived quality of logging data collected can provide benefits to the software apps and/or the developer community that depend on a quantitative measure of usage, functionality, redundancy, and so forth, as relates to the software app. The techniques described herein are dynamic, and so can apply to an entire ecosystem of apps and developers.
- a user may load arbitrary data to a server, possibly logged in a non-differentially private manner.
- the uploaded data may then be queried at the server-side in a differentially private manner.
- the user has to trust that the server operations are appropriately secure; however, the data could be vulnerable to unintentional misuse.
- Another approach may be to apply differential privacy at the clientside, but these approaches are generally non-trivial to use and configure.
- the approach described herein addresses these challenges by adding differential privacy at the client-side to collect differentially private logging data, as opposed to adding differential privacy at the server-side.
- the user and/or the developer do not have to be involved in the use or configuration of the differential privacy parameters. This eliminates a need for a user or a developer to understand and/or use the complex technical underpinnings of differential privacy.
- the approach results in several other advantages. For example, a hostile actor eavesdropping on a client device may be thwarted in their attempts to collect information when the logging data is collected at the client-side in a differentially private manner. As another example, a user of the client device may be able to eavesdrop the logging data to verify some extent of the differential privacy behavior.
- Differential privacy platform 130 may access, and/or retrieve the logging data from logging data collector 120, and perform analyses. For example, differential privacy platform 130 may, at block 140, perform operations to analyze the received logging data. Such analysis may involve generating e-, and (f, ⁇ 5)-differentially private statistics over datasets (e.g., logging data collector 120).
- Differential privacy platform 130 may, at block 150, perform operations to determine a differential privacy policy.
- the adjusting of the initial differential privacy parameter may involve determining an amount of noise to be added to the logging data.
- one approach to preserving ⁇ -differential privacy is an addition of Laplacian noise to original data for information release.
- Another approach may involve (e, ⁇ 5) -differential privacy.
- the parameter s represents a privacy degree and the parameter 8 represents a probability of violating privacy. For both parameters, smaller values may correspond to higher privacy.
- an IP address may be reported as logging into the email application sixteen (16) times a day; however, there is no way to determine whether the count “16” is an actual count, or is plus or minus a random number from the actual count. For example, the IP address may have logged in between 6 to 26 times, where a random number between -10 and 10 may be added to the actual login count.
- the adjusting of the initial differential privacy parameter may involve determining a contribution bounding to be added to the logging data.
- the term “contribution bounding” may generally refer to a process of limiting contributions by a single individual (or an entity represented by a privacy key) to the output dataset or its partition.
- the differential privacy parameter may be based on a maximum contribution that may be made by a single user (or client device).
- the differential privacy parameter may be based on a minimum contribution that may be made by a single user (or client device). For example, in measuring screen time usage, a minimum value may be set to 10 minutes. Accordingly, in the event a client device logs a screen time of less than 10 minutes, the logging data may be reported as 10 minutes.
- Differential privacy may also involve using logging data from different subsets of a group of client devices. For example, in the event 90% of the client devices in a group may be configured to collect and provide logging data, the client devices that constitute the 90% may be changed at regular intervals. Such changes are not discernible to the logging system. The data received is from 90% of the client devices, without insight into changes in the group of client devices from which the logging data is received.
- Differential privacy platform 130 may, at block 160, perform operations to generate a configuration (config) file with a differential privacy policy. As described herein, instead of manually tuning the collection of logging data, such changes may be performed in a scalably automated manner.
- the configuration file may include configuration settings that may be read during a runtime for the software application.
- the configuration file may be configured in extensible markup language (XML), YAML Ain't Markup Language (YAML), JavaScript Object Notation (JSON), and so forth.
- the configuration file may include the parameters and settings for the differential privacy policy.
- the adjusting of the initial differential privacy parameter may be performed based on the aggregated statistics, SI, S2, ..., SK, associated with the one or more columns of the second plurality of columns, Cl, C2, ..., CK.
- the aggregated statistics, SI, S2, ..., SK associated with the one or more columns of the second plurality of columns, Cl, C2, ..., CK.
- one or more columns of the second plurality of columns, Cl, C2, ..., CK may be associated with a differential privacy parameter DI, D2, ..., DK.
- the differential privacy parameter DI, D2, ..., DK may include noise, sensitivity, clamping bounds, contribution limits, and so forth.
- the differential privacy parameter may be determined for one or more columns.
- the differential privacy parameter may be determined for a portion or an entirety of a database.
- the non-private sample group would enable a periodic repetition of a derivation of parameters u c , b c for the Laplacian noise.
- the updated statistics (updated parameters u c , b c ) for the adjusted logging data are determined to be substantively similar to the prior statistics (prior parameters u c , b c ) corresponding to the prior logging data, no further action may be needed.
- a user may be provided with controls allowing the user to make an election as to both if and when systems, programs, or features described herein may enable collection of logging data, and if the user is sent content or communications from a server.
- certain data may be treated in one or more ways before it is stored or used, so that personal data is removed, secured, encrypted, and so forth.
- a purpose of differential privacy is to enable a user’s identity to be treated in a manner so that no user data can be determined for the user, or a user’s geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined.
- programmable devices can be indirectly connected to network 306 via an associated computing device, such as programmable device 304c.
- programmable device 304c can act as an associated computing device to pass electronic communications between programmable device 304d and network 306.
- a computing device can be part of and/or inside a vehicle, such as a car, a truck, a bus, a boat or ship, an airplane, etc.
- a programmable device can be both directly and indirectly connected to network 306.
- server devices 308 and/or 310 can provide programmable devices 304a-304e with access to software for database, search, computation, graphical, audio, video, World Wide Web/Internet utilization, and/or other functions. Many other examples of server devices are possible as well.
- User interface module 401 can be operable to send data to and/or receive data from external user input/output devices.
- user interface module 401 can be configured to send and/or receive data to and/or from user input devices such as a touch screen, a computer mouse, a keyboard, a keypad, a touch pad, a trackball, a joystick, a voice recognition module, and/or other similar devices.
- user input devices such as a touch screen, a computer mouse, a keyboard, a keypad, a touch pad, a trackball, a joystick, a voice recognition module, and/or other similar devices.
- User interface module 401 can also be configured to provide output to user display devices, such as one or more cathode ray tubes (CRT), liquid crystal displays, light emitting diodes (LEDs), displays using digital light processing (DLP) technology, printers, light bulbs, and/or other similar devices, either now known or later developed.
- CTR cathode ray tubes
- LEDs light emitting di
- User interface module 401 can also be configured to generate audible outputs, with devices such as a speaker, speaker jack, audio output port, audio output device, earphones, and/or other similar devices. User interface module 401 can further be configured with one or more haptic devices that can generate haptic outputs, such as vibrations and/or other outputs detectable by touch and/or physical contact with computing device 400. In some examples, user interface module 401 can be used to provide a graphical user interface (GUI) for utilizing computing device 400.
- GUI graphical user interface
- network communications module 402 can be configured to provide reliable, secured, and/or authenticated communications.
- information for facilitating reliable communications e.g., guaranteed message delivery
- a message header and/or footer e.g., packet/message sequencing information, encapsulation headers and/or footers, size/time information, and transmission verification information such as cyclic redundancy check (CRC) and/or parity check values.
- CRC cyclic redundancy check
- One or more processors 403 can include one or more general purpose processors, and/or one or more special purpose processors (e.g., digital signal processors, tensor processing units (TPUs), graphics processing units (GPUs), application specific integrated circuits, etc.).
- One or more processors 403 can be configured to execute computer-readable instructions 406 that are contained in data storage 404 and/or other instructions as described herein.
- Data storage 404 can include one or more non-transitory computer-readable storage media that can be read and/or accessed by at least one of one or more processors 403.
- the one or more computer-readable storage media can include volatile and/or non-volatile storage components, such as optical, magnetic, organic or other memory or disc storage, which can be integrated in whole or in part with at least one of one or more processors 403.
- data storage 404 can be implemented using a single physical device (e.g., one optical, magnetic, organic or other memory or disc storage unit), while in other examples, data storage 404 can be implemented using two or more physical devices.
- Data storage 404 can include computer-readable instructions 406 and perhaps additional data.
- data storage 404 can include storage required to perform at least part of the herein-described methods, scenarios, and techniques and/or at least part of the functionality of the herein-described devices and networks.
- data storage 404 can include storage for a logging data module 412 (e.g., to collect and store differentially private logging data based on a configuration file).
- computer- readable instructions 406 can include instructions that, when executed by the one or more processors 403, enable computing device 400 to provide for some or all of the functionality of logging data module 412.
- the method further involves adjusting the initial differential privacy parameter based on an analysis of the received logging data.
- the method also involves generating a configuration file comprising the adjusted differential privacy parameter.
- the method additionally involves sending the configuration file to the plurality of client devices, wherein the sending of the configuration file causes the plurality of client devices to collect modified logging data.
- Some embodiments involve receiving, from the plurality of client devices, the modified logging data associated with the software application. Such embodiments also involve readjusting the adjusted differential privacy parameter based on another analysis of the received modified logging data. Such embodiments further involve generating a second configuration file comprising the adjusted differential privacy parameter as readjusted. Such embodiments additionally involve sending the second configuration file to the plurality of client devices.
- the analysis of the received logging data involves determining structured information related to the logging data. Such embodiments also involve determining shape information of the logging data. The adjusting of the initial differential privacy parameter may be based on the structured information and the shape information.
- the logging data may include a usage count.
- the adjusted differential privacy parameter may include a threshold value for a logging of the usage count.
- the received logging data may be arranged in a tabular format comprising a first plurality of rows and a second plurality of columns. Each row of the first plurality of rows may correspond to a client device of the plurality of client devices.
- the adjusting of the initial differential privacy parameter may be performed based on aggregated statistics associated with one or more of the second plurality of columns.
- the Laplacian noise for a given column may be based on a mean and standard deviation of the values in the given column.
- the adjusting of the initial differential privacy parameter may involve determining a contribution bounding to be added to the logging data.
- the adjusting of the initial differential privacy parameter may involve determining a partition selection for the logging data.
- each block and/or communication may represent a processing of information and/or a transmission of information in accordance with example embodiments.
- Alternative embodiments are included within the scope of these example embodiments.
- functions described as blocks, transmissions, communications, requests, responses, and/or messages may be executed out of order from that shown or discussed, including substantially concurrent or in reverse order, depending on the functionality involved.
- more or fewer blocks and/or functions may be used with any of the ladder diagrams, scenarios, and flow charts discussed herein, and these ladder diagrams, scenarios, and flow charts may be combined with one another, in part or in whole.
- a block that represents a processing of information may correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique.
- a block that represents a processing of information may correspond to a module, a segment, or a portion of program code (including related data).
- the program code may include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique.
- the program code and/or related data may be stored on any type of computer readable medium such as a storage device including a disk or hard drive or other storage medium.
- the computer readable medium may also include non-transitory computer readable media such as non-transitory computer-readable media that stores data for short periods of time like register memory, processor cache, and random access memory (RAM).
- the computer readable media may also include non-transitory computer readable media that stores program code and/or data for longer periods of time, such as secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example.
- the computer readable media may also be any other volatile or nonvolatile storage systems.
- a computer readable medium may be considered a computer readable storage medium, for example, or a tangible storage device.
- a block that represents one or more information transmissions may correspond to information transmissions between software and/or hardware modules in the same physical device. However, other information transmissions may be between software modules and/or hardware modules in different physical devices.
- a user may be provided with controls allowing the user to make an election as to both if and when systems, programs, or features described herein may enable collection of user information (e.g. , information about a user’s social network, social actions, or activities, profession, a user’s preferences, a user’s demographic information, a user’s current location, or other personal information), and if the user is sent content or communications from a server.
- user information e.g. , information about a user’s social network, social actions, or activities, profession, a user’s preferences, a user’s demographic information, a user’s current location, or other personal information
- certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information is removed.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Apparatus and methods related to input recognition are provided. A method includes receiving, from a plurality of client devices, logging data associated with a software application, wherein the software application is installed in the plurality of client devices, and wherein the logging data is based on an initial differential privacy parameter. The method further includes adjusting the initial differential privacy parameter based on an analysis of the received logging data. The method additionally includes generating a configuration file comprising the adjusted differential privacy parameter. The method further includes sending the configuration file to the plurality of client devices, wherein the sending of the configuration file causes the plurality of client devices to collect modified logging data.
Description
SYSTEMS AND METHODS FOR AN AUTOMATIC DIFFERENTIAL PRIVACY POLICY
BACKGROUND
[1] Many modern computing devices, including mobile phones, personal computers, and tablets, include software applications (“apps”) for various products and/or services. Developers of such apps provide various types of functionality, and typically include logging code to quantitatively measure usage. Such logging data may be used to determine an effectiveness of the products and/or services, identify and/or troubleshoot issues related to use of the products and/or services, identify areas of user interest, and so forth. The developers may then improve the offered products and/or services based on the usage data.
SUMMARY
[2] This application generally relates to an automatic application of differential privacy protocols in client devices. Usage data may be collected from client devices and analyzed for the effectiveness and/or utility of various products and services. However, collection and analysis of such usage data is to be performed in a manner that is agnostic to any particular user, and does not reveal an identity of a particular user. Generally speaking, this may be achieved in several ways, including, for example by adding noise to the usage data to protect individual privacy, range-clamping the amount of collected data, defining contribution limits, and adding synthetically generated data. However, addition of the noise may diminish the utility of the usage data. The term “differential privacy” generally refers to a quantitative measure of a trade-off between the added noise and individual privacy.
[3] Different products and services may be associated with different attributes such as a number of users, a number of client devices, frequency of use, user locations, time of usage, a duration of use, and so forth. The usage data may therefore change based on one or more of such attributes. Accordingly, a one-size-fits-all approach to collection and/or analysis of usage data may not be optimal in terms of the trade-off between the added noise and individual privacy. For example, while adding a particular amount or type of noise for one application may facilitate data analysis, the same amount of noise for another application may diminish the utility of the usage data, and/or may not respect individual privacy. Therefore, there is a need for an automatic differential privacy protocol that is tailored to the different attributes of the products and/or services.
[4] In one aspect, a computer-implemented method is provided. The method includes receiving, from a plurality of client devices, logging data associated with a software application, wherein the software application is installed in the plurality of client devices, and wherein the logging data is based on an initial differential privacy parameter. The method further includes adjusting the initial differential privacy parameter based on an analysis of the received logging data. The method additionally includes generating a configuration file comprising the adjusted differential privacy parameter. The method further includes sending the configuration file to the plurality of client devices, wherein the sending of the configuration file causes the plurality of client devices to collect modified logging data.
[5] In another aspect, a computing device is provided. The computing device includes one or more processors and data storage. The data storage has stored thereon computer-executable instructions that, when executed by one or more processors, cause the computing device to carry out functions. The functions include receiving, from a plurality of client devices, logging data associated with a software application, wherein the software application is installed in the plurality of client devices, and wherein the logging data is based on an initial differential privacy parameter. The functions further include adjusting the initial differential privacy parameter based on an analysis of the received logging data. The functions additionally include generating a configuration file comprising the adjusted differential privacy parameter. The functions further include sending the configuration file to the plurality of client devices, wherein the sending of the configuration file causes the plurality of client devices to collect modified logging data.
[6] In another aspect, an article of manufacture is provided. The article of manufacture includes one or more non-transitory computer readable media having computer-readable instructions stored thereon that, when executed by one or more processors of a computing device, cause the computing device to carry out functions. The functions include receiving, from a plurality of client devices, logging data associated with a software application, wherein the software application is installed in the plurality of client devices, and wherein the logging data is based on an initial differential privacy parameter. The functions further include adjusting the initial differential privacy parameter based on an analysis of the received logging data. The functions additionally include generating a configuration file comprising the adjusted differential privacy parameter. The functions further include sending the configuration file to
the plurality of client devices, wherein the sending of the configuration file causes the plurality of client devices to collect modified logging data.
[7] In another aspect, a system is provided. The system includes means for receiving, from a plurality of client devices, logging data associated with a software application, wherein the software application is installed in the plurality of client devices, and wherein the logging data is based on an initial differential privacy parameter; means for adjusting the initial differential privacy parameter based on an analysis of the received logging data; means for generating a configuration file comprising the adjusted differential privacy parameter; and means for sending the configuration file to the plurality of client devices, wherein the sending of the configuration file causes the plurality of client devices to collect modified logging data.
[8] The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the figures and the following detailed description and the accompanying drawings.
BRIEF DESCRIPTION OF THE FIGURES
[9] FIG. 1 illustrates an overall system for automated differential privacy, in accordance with example embodiments.
[10] FIG. 2 illustrates an example tabular format for determination of a differential privacy parameter, in accordance with example embodiments.
[11] FIG. 3 depicts a distributed computing architecture, in accordance with example embodiments.
[12] FIG. 4 is a block diagram of a computing device, in accordance with example embodiments.
[13] FIG. 5 is a flowchart of a method, in accordance with example embodiments.
DETAILED DESCRIPTION
[14] Example methods, devices, and systems are described herein. It should be understood that the words “example” and “exemplary” are used herein to mean “serving as an example, instance, or illustration.” Any embodiment or feature described herein as being an “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other
embodiments or features. Other embodiments can be utilized, and other changes can be made, without departing from the scope of the subject matter presented herein.
[15] Thus, the example embodiments described herein are not meant to be limiting. Aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are contemplated herein.
[16] Further, unless context suggests otherwise, the features illustrated in each of the figures may be used in combination with one another. Thus, the figures should be generally viewed as component aspects of one or more overall embodiments, with the understanding that not all illustrated features are necessary for each embodiment.
Overview
[17] This application generally relates to an automatic application of differential privacy protocols in client devices. Mobile devices generally include software applications (“apps”) for various products and/or services. Developers of such apps provide various types of functionality, and typically include logging code to quantitatively measure usage. As described herein, a developer can opt-in to automatic differential privacy when a logging code for an app is written. As a user interacts with the app, the logging code may log differentially private usage data based on various differential privacy parameters. Such usage data may be provided to a server for further analysis. The server-side system may analyze the collected data and may automatically adjust the differential privacy parameters in several ways, including, for example by adding noise to the usage data to protect individual privacy, range-clamping the amount of collected data, defining contribution limits, and adding synthetically generated data, and so forth. The adjusted differential privacy parameters may be added to a configuration file. The configuration file may then be sent to the client devices, where the logging code can modify future collection of logging data based on the configuration file. Subsequently, the modified logging code at the client device may continue to log differentially private usage data based on the adjusted differential privacy parameters. Logging data may be received during the course of a lifetime of an app, and the differential privacy policy may be updated and/or adjusted accordingly, if needed.
[18] For aggregated metrics, the service core logger may include structured information about the usage data (including potential partitioning keys in the list of dimensions specified
by the developer). Also, the service core logger may include information about the shape of the data being logged. These signals may be combined to generate the differential privacy parameters (e.g., min/max values, contribution limits, epsilon, sensitivity, etc.), which are written to the configuration file and pushed back to the client devices. Once the configuration is received, the client device begins to upload logs that are differentially private.
[19] The aggregated metrics may be based on feature level logging. For example, a “smart compose” feature may include Al generated response snippets that are provided to a user when the user is composing an email. The logging data can include a number of times the user selects an Al generated response snippet and uses it without editing it. Another example of logging data can include a number of times the user selects an Al generated response snippet and uses it with edits. Some other examples can involve usage data for a language-based virtual keyboard (e.g., number of times it is used, a count of the keystrokes, character count, etc.). Another example can involve a number of SMS messages sent from country A to country B. Such logging data can indicate a usefulness of the feature. And differential privacy based on the aggregated metrics can provide usage information without revealing an identity of the individual user.
[20] As such, the herein-described techniques can improve a type of logging data that is to be collected by tailoring it to a device, a particular app, a functionality of the app, and/or a feature thereof. Such an approach prevents an over-collection of logging data that may fall short of minimizing, maintaining and/or providing formal guarantees for user privacy. At the same time, the approach described herein optimizes data collection so that useful information may be extracted.
[21] An automated approach eliminates a need for developers to make changes to their codes, provide updated versions, and so forth, and this can reduce costs, memory usage, and power consumption. For example, instead of individual developers making adjustments to respective software applications to update a differential privacy policy, a platform-based approach may be utilized that can apply updates to more than one software application.
[22] Another advantage of an automated approach is that it enables automatic updates for different features at different times. Additionally, collection of logging data can impact compute resources and may negatively impact the running of software applications. Accordingly, tailoring a collection and/or transmission of logging data to an intended use can
result in improvements in battery performance, use of memory resources (e.g., for storing logging data, and/or updating newer versions of an app to change a differential privacy policy, etc.), use of network resources (e.g., for transmitting logging data), among others. Enhancing the actual and/or perceived quality of logging data collected can provide benefits to the software apps and/or the developer community that depend on a quantitative measure of usage, functionality, redundancy, and so forth, as relates to the software app. The techniques described herein are dynamic, and so can apply to an entire ecosystem of apps and developers.
Automated Differential Privacy
[23] FIG. 1 illustrates an overall system 100 for automated differential privacy, in accordance with example embodiments. A plurality of client devices, 105(1), 105(2), 105(3), ..., 105(M) are shown. The plurality of client devices, 105(1), 105(2), 105(3), ..., 105(M) may include one or more software applications that are installed in the client devices. Different client devices may share some of the software applications. However, different client devices may include different software applications as well.
[24] For example, an enlarged view of client device 105(1) illustrates software applications “App 1,” “App 2,” ..., “App N ” Generally, the software applications may collect logging data associated with a software application. For example, “App 1” may be associated with “Logging Data 1,” “App 2” may be associated with “Logging Data 2,” “App N” may be associated with “Logging Data N,” and so forth. The term “logging data” as used herein, generally refers to data that records events related to a software application. For example, the software application may be a virtual keyboard, and the logging data may record events related to a user interaction, time of use, duration of use, keystrokes, input modes, other software applications that access the virtual keyboard, a language model used, software updates, and so forth. Logging data may be used (e.g., by developers of a software application, by device manufacturers, and so forth), to analyze various aspects of the software application, and to improve the quality and/or reliability of the software application. This may involve providing bug fixes, security updates, functional updates, and so forth.
[25] The term “software application” or “app” as used herein, can be any computer program that is configured to interact with a user of a computing device. Example software applications can include a search application, an email application, a text message application, an instant messaging application, a web browsing application, a mapping application, a media playback
application, a weather application, a phone application, a video communication application, a camera application, an application associated with a service provider (e.g., financial, insurance, etc.), an application associated with a digital assistant (e.g., a home assistant), or any other application program configured to receive user input such as speech audio input, digital text input, alpha-numeric input, character input, and/or digital image input.
[26] The term “interaction” can broadly refer to any activity, active and/or passive, performed by a user with device 110, or an application program on device 110. For example, an interaction can involve viewing content, listening to content, inputting, editing, and/or modifying content (e.g., via a keyboard, a mouse, a tap, and so forth), a sensory interaction (e.g., haptic, visual, auditory, tactile, and so forth), a scrolling interaction, a voice interaction, a user selection, and so forth. In some embodiments, the interaction may not be a direct interaction of the user with the content. For example, the user may listen to a particular genre of songs, or watch a particular genre of movies. The computing device may determine user interaction with a particular song from a particular genre as an interaction with songs of the same genre in the library. Likewise, the computing device may determine user interaction with a particular movie from a particular genre as an interaction with movies of the same genre in the library. As another example, a user interaction with an electronic mail can be determined to be an interaction with an entire chain of electronic mails, and/or a plurality of mail exchanges with a particular sender of the electronic mail.
[27] Logging data transmission 110 may involve sending the logging data to a logging data collector 120. For example, app developers may include code that collects logging data (e.g., based on app metrics and benchmarks). In some embodiments, at an initial stage, some of the logging data may be collected in a non-differential privacy manner. In some embodiments, the initial logging data may be collected in a differential privacy manner based on an initial differential privacy parameter; however, the differential privacy parameter may not be tailored to a device, apps and/or events. Some differential privacy approaches are based on applying differential privacy to logging data at the server-side, where the logging data itself is collected in a non-differentially private manner. In some approaches, a user may load arbitrary data to a server, possibly logged in a non-differentially private manner. The uploaded data may then be queried at the server-side in a differentially private manner. In such situations, the user has to trust that the server operations are appropriately secure; however, the data could be vulnerable
to unintentional misuse. Another approach may be to apply differential privacy at the clientside, but these approaches are generally non-trivial to use and configure.
[28] The approach described herein addresses these challenges by adding differential privacy at the client-side to collect differentially private logging data, as opposed to adding differential privacy at the server-side. In implementing this, the user and/or the developer do not have to be involved in the use or configuration of the differential privacy parameters. This eliminates a need for a user or a developer to understand and/or use the complex technical underpinnings of differential privacy. The approach results in several other advantages. For example, a hostile actor eavesdropping on a client device may be thwarted in their attempts to collect information when the logging data is collected at the client-side in a differentially private manner. As another example, a user of the client device may be able to eavesdrop the logging data to verify some extent of the differential privacy behavior.
[29] Generally, developers are able to access metrics related to how successfully their apps may be engaging their users (e.g., by using metrics such as daily active users, revenue per active user, and so forth) in a manner that helps ensure individual users cannot be identified or re-identified. By adding differential privacy automatically to app metrics at the client-side, meaningful insights may be obtained to help developers improve their apps without compromising user privacy, and/or developer confidentiality. In some embodiments, developers may be provided with feedback on app usage and metrics. For example, developers may be provided with a dashboard that displays the statistics for the logging data, or developers may be able to write SQL queries to a database and retrieve relevant information.
[30] Differential privacy platform 130 may access, and/or retrieve the logging data from logging data collector 120, and perform analyses. For example, differential privacy platform 130 may, at block 140, perform operations to analyze the received logging data. Such analysis may involve generating e-, and (f, <5)-differentially private statistics over datasets (e.g., logging data collector 120).
[31] The differentially private statistics may utilize algorithms such as a Laplace mechanism, Gaussian mechanism, counts, sums, averages, medians, variance, standard deviation, quantiles, percentiles, automatic bounds approximation, truncated geometric thresholding, Laplace thresholding, Gaussian thresholding, and/or pre-thresholding, and so forth may be determined. In some embodiments, approaches such as Randomized Aggregatable
Privacy-Preserving Ordinal Response (RAPPOR) may be applied to provide a privacypreserving approach to learn software statistics that may be leveraged to safeguard users’ security, find bugs, and/or improve an overall user experience.
[32] In some embodiments, the analysis of the received logging data involves determining structured information related to the logging data. Such embodiments also involve determining shape information of the logging data. The adjusting of the initial differential privacy parameter may be based on the structured information and the shape information. For example, data structures that include images, and/or shapes of two-dimensional (2D) objects, may be represented as points on a suitable manifold. In such situations, the differential privacy parameter may be based on an underlying structure and/or geometry of the manifold. For example, a curvature of the manifold may impact the differential privacy parameter.
[33] Differential privacy platform 130 may, at block 150, perform operations to determine a differential privacy policy. In some embodiments, the adjusting of the initial differential privacy parameter may involve determining an amount of noise to be added to the logging data. For example, one approach to preserving ^-differential privacy is an addition of Laplacian noise to original data for information release. Another approach may involve (e, <5) -differential privacy. Generally speaking, the parameter s represents a privacy degree and the parameter 8 represents a probability of violating privacy. For both parameters, smaller values may correspond to higher privacy.
[34] In some embodiments, the logging data may include a usage count. The adjusted differential privacy parameter may include a threshold value for a logging of the usage count. For example, a count query may return an estimate of a number of individual records in the data (e.g., how many client devices) are involved in a given event. For example, a count query may be used to return a number of client devices corresponding to international SMS messages. Differentially private responses to count queries may be obtained by adding random noise to the responses. For example, in a non-differentially privacy setting, a user may be logging in to an email application sixteen (16) times a day, indicating an actual login count. However, in a differentially private setting, an IP address may be reported as logging into the email application sixteen (16) times a day; however, there is no way to determine whether the count “16” is an actual count, or is plus or minus a random number from the actual count. For
example, the IP address may have logged in between 6 to 26 times, where a random number between -10 and 10 may be added to the actual login count.
[35] In some embodiments, the logging data may include synthetic data or “fake” data. Synthetic data may be produced from a statistical model based on the original data. Synthetic data generally resembles the original sensitive data in format, and may maintain properties of the original data (e.g., correlations between attributes).
[36] In some embodiments, the adjusting of the initial differential privacy parameter may involve determining a contribution bounding to be added to the logging data. Generally, protecting unbounded contributions would require adding infinite noise. The term “contribution bounding” may generally refer to a process of limiting contributions by a single individual (or an entity represented by a privacy key) to the output dataset or its partition. For example, the differential privacy parameter may be based on a maximum contribution that may be made by a single user (or client device). In other instances, the differential privacy parameter may be based on a minimum contribution that may be made by a single user (or client device). For example, in measuring screen time usage, a minimum value may be set to 10 minutes. Accordingly, in the event a client device logs a screen time of less than 10 minutes, the logging data may be reported as 10 minutes.
[37] In some embodiments, the adjusting of the initial differential privacy parameter may involve determining a partition selection for the logging data. The term “partition” may generally refer to a subset of the data corresponding to a given value of a statistical aggregation criterion. Generally speaking, each partition may be aggregated separately. For example, when a number of international SMS messages sent are counted, the messages for one particular country may be a single partition, and the count of SMS messages to that country may be the aggregate for that partition.
[38] Generally, logging data may be collected in aggregates prior to sending for analysis. For example, a client device may send a first SMS message from a first country to a second country. A second SMS message may be sent after some time. Accordingly, the logging data need not be transmitted individually. Instead, a country name and an aggregate count may be transmitted. Also, for example, a number of SMS messages sent to the second country may be capped at a maximum count. For example, when the maximum count is 100, then any additional SMS messages sent by the client device are no longer counted for purposes of the
logging data. The maximum count may be adjusted based on a country, a number of messages sent and/or received, and so forth.
[39] Differential privacy may also involve using logging data from different subsets of a group of client devices. For example, in the event 90% of the client devices in a group may be configured to collect and provide logging data, the client devices that constitute the 90% may be changed at regular intervals. Such changes are not discernible to the logging system. The data received is from 90% of the client devices, without insight into changes in the group of client devices from which the logging data is received.
[40] Differential privacy platform 130 may, at block 160, perform operations to generate a configuration (config) file with a differential privacy policy. As described herein, instead of manually tuning the collection of logging data, such changes may be performed in a scalably automated manner. The configuration file may include configuration settings that may be read during a runtime for the software application. In some embodiments, the configuration file may be configured in extensible markup language (XML), YAML Ain't Markup Language (YAML), JavaScript Object Notation (JSON), and so forth. The configuration file may include the parameters and settings for the differential privacy policy. For example, for a particular device, a particular software application, and/or a particular event associated with the particular software application, the configuration file may determine the logging data to be collected, if certain logging data is to be collected and/or transmitted to logging data collector 120, an amount of noise to be added, when to initiate or terminate collection of logging data, and so forth. Although a single config file is being used for illustrative purposes, the file may include multiple configuration files for different system components. For example, different config files may be generated for user interface (UI) settings, network and/or storage parameters, different application programs, events, and so forth.
[41] Configuration file transmission 170 may involve the transmission of the generated configuration file by differential privacy platform 130 to the plurality of client devices, 105(1), 105(2), 105(3), ..., 105(M), wherein the sending of the configuration file causes the plurality of client devices, 105(1), 105(2), 105(3), ..., 105(M) to collect modified logging data. For example, in the event the plurality of client devices, 105(1), 105(2), 105(3), ..., 105(M) were not logging data in a differentially private manner, the sending of the configuration file causes
the plurality of client devices, 105(1), 105(2), 105(3), ..., 105(M) to begin collecting logging data in a differentially private manner.
[42] Also, for example, the configuration file may be iteratively generated and the sending of the configuration file may cause the plurality of client devices, 105(1), 105(2), 105(3), ..., 105(M) to change a manner in which logging data is collected in a differentially private manner. For example, modified logging data associated with the software application may be received (e.g., by logging data collector 120), and analyzed (e.g., by differential privacy platform 130 at block 140). At block 150, the adjusted differential privacy parameter may be readjusted based on another analysis of the received modified logging data. At block 160, a second configuration file comprising the adjusted differential privacy parameter as readjusted may be generated. Configuration file transmission 170 may involve the transmission of the second configuration file to the plurality of client devices, 105(1), 105(2), 105(3), ..., 105(M).
[43] FIG. 2 illustrates an example tabular format 200 for determination of a differential privacy parameter, in accordance with example embodiments. In some embodiments, the received logging data may be arranged in a tabular format 200 comprising a first plurality of rows, Rl, R2, ..., RN, and a second plurality of columns, Cl, C2, ..., CK. Each row of the first plurality of rows Rl, R2, ..., RN, may correspond to a client device of the plurality of client devices (e.g., the plurality of client devices, 105(1), 105(2), 105(3), ..., 105(M)).
[44] In some embodiments, each row of the first plurality of rows, Rl, R2, ..., RN, may correspond to a different client device. In some embodiments, multiple rows may correspond to a single client device. Also, for example, logging data may not be collected from all the client devices. In some embodiments, each column of the second plurality of columns, Cl, C2, ..., CK, may correspond to an attribute. The term “attribute” may be any type of logging data that may be collected. For example, the attribute may be associated with a particular software application, a particular event associated with the software application, and so forth. For example, an attribute may correspond to a color of a button with values “blue,” “green,” or “red”. In some embodiments, these values may be represented numerically, for example, as blue = 0, red = 1, green = 2. As another example, an attribute may be a record of a number of clicks with positive integer values.
[45] As indicated in row 205, aggregated statistics, SI, S2, ..., SK, may be determined for one or more columns of the second plurality of columns, Cl, C2, ..., CK. The adjusting of the
initial differential privacy parameter may be performed based on the aggregated statistics, SI, S2, ..., SK, associated with one or more of the second plurality of columns, Cl, C2, ..., CK. The aggregated statistics, SI, S2, ..., SK may include a count, mean, variance, quantile, and other statistical measures of data.
[46] In some embodiments, the adjusting of the initial differential privacy parameter may be performed based on the aggregated statistics, SI, S2, ..., SK, associated with the one or more columns of the second plurality of columns, Cl, C2, ..., CK. For example, as indicated in row 210, one or more columns of the second plurality of columns, Cl, C2, ..., CK may be associated with a differential privacy parameter DI, D2, ..., DK. The differential privacy parameter DI, D2, ..., DK may include noise, sensitivity, clamping bounds, contribution limits, and so forth. In some embodiments, the differential privacy parameter may be determined for one or more columns. In some embodiments, the differential privacy parameter may be determined for a portion or an entirety of a database.
[47] For example, the adjusting of the initial differential privacy parameter may involve determining differential privacy parameters DI, D2, ..., DK by determining an amount of noise to be added to the logging data based on the aggregated statistics, SI, S2, ..., SK, associated with the one or more columns of the second plurality of columns, Cl, C2, ..., CK. In some embodiments, the determining of the amount of noise may involve determining a Laplacian noise distribution (e.g., parametrized by sensitivity divided by epsilon) for the one or more columns of the second plurality of columns. For example, the Laplacian noise for a given column may be based on a mean and standard deviation of the values in the given column.
[48] For example, column Cl may include click-rate counts (e.g., a count of user clicking a blue button, a red button, or a green button), and aggregated statistics SI may correspond to a sum, a weighted sum (e.g., weighted by type of device, time of day, geographical location, etc.), a mean, a mode, or other statistics. Accordingly, differential privacy parameter DI may correspond to a random addition of noise to the aggregated statistics SI. This may include adding a random number between -10 and +10 to the logging data that is to be collected in the future. Also, for example, differential privacy parameter DI may indicate whether such logging data is to be collected from all the devices, a sub-plurality of the devices, a particular group, and so forth. In some embodiments, differential privacy parameter DI may change a group of
client devices to which the updated differential privacy parameter DI is to be applied. A configuration file may include such information.
[49] As another example, column C2 may include a count of SMS messages sent to a particular country, and aggregated statistics S2 may correspond to a mean m and variance v for a probability distribution for the data in column C2. Accordingly, differential privacy parameter D2 may correspond to a random addition of noise to the aggregated statistics S2. This may include adding a random deviation from the mean m between -kv and +kv to the logging data that is to be collected in the future. Here, the value of k may be determined to be 7, 2, etc. depending on a number of client devices, a number of counts based on the logging data, the actual values of the mean m and the variance v, and so forth. Also, for example, differential privacy parameter D2 may indicate whether such logging data is to be collected from all the devices, a sub-plurality of the devices, a particular group, and so forth. In some embodiments, differential privacy parameter D2 may change a group of client devices to which the updated differential privacy parameter D2 is to be applied. A configuration file may include such information.
[50] In some embodiments, for a database D (e.g., tabular format 200), each row in D may represent a "log upload" from a client device. Let C denote the column set ofD. Some embodiments may involve estimating a cardinality ?! and a sensitivity S. The cardinality ?! represents a number of unique devices having a "log upload" in D. The cardinality ?! may be computed using a unique identifier in the log upload, or estimated based on the size of D and an expected number of logs per client device.
[51] In some embodiments, the sensitivity S of a column c in C for the database D may be defined as a maximum contribution by a unique client device. For example, “1” may represent a single count in a contribution by the client device. Generally, the sensitivity S determines an amount of noise to be added to a particular column. For example, the noise may be drawn from a Laplacian distribution, parametrized by sensitivity S divided by £. The sensitivity S of the database D may be generalized as:
A/ = max||/(Dl) - /(D2) ||1
(Eqn. 1)
[52] where DI and D2 are neighboring databases, with a difference of at most one row, and the norm may correspond to an
norm. In some embodiments, DI and D2 may be determined
by removing contributions from any individual user (or client device) from the collected data. In the event each individual (e.g., person, device) uploads one row of data, then DI and D2 may be determined by dropping an arbitrary row from the collected data.
[53] For example, in the event DI corresponds to:
Table 1
[54] Then D2 may correspond to:
Table 2
[55] In some embodiments, column-wise parameters uc, bc for a Laplacian noise may be determined for a column c in C as follows:
(Eqn. 2) in order to achieve a certain level of (e, ^-differential privacy. Generally, (x |g, 6) represents the Laplacian distribution parameterized by g, b. In some embodiments, b may be set to zero. Also, for example, g = sensitivity / E or 8f/s. The values of E, 8f may be chosen to optimize privacy while maintaining an acceptable level of utility of the logging data (e.g., for a specified use case). In some embodiments, E = 0.1 and 6f = 10-5 may be used. Accordingly, the values uc, bc for the Laplacian noise may be determined for the column c by setting bc = 0, and uc = . In the event the columns represent independent events, the values of s and/or 8f may be different for different columns. Also, for example, the values of E and/or 8f may be determined based on a sensitivity of the data and/or a size of the database.
[56] The choice of E and/or 8f may be adjusted based on an intended use for the logging data. For example, running a particular software may cause a significant depletion of device
battery. In such an instance, a lower value for E may be used. However, to measure click rates for a particular selectable button presented in a UI, a higher value for s may be used. Generally, when an accurate directional trend for the logging data is available, a much higher level of inaccuracy may be acceptable. Accordingly, in the first example of battery consumption, one may add or subtract a random number such as “10” to the logging data. In the second case for click-rates, one may adjust by one standard deviation of the click-rate distribution.
[57] In some embodiments, the sensitivity may be synchronized for a number of devices, a given value of 8f, and a given value of s, to every device in a private group. In parallel, some proportion of the client devices may be selected to not belong to the private group. Such unselected client devices may become a part of another group used to estimate the next cardinality and sensitivity. An observer of logs from an individual client device would not be able to identify whether the client device belongs to the private or non-private group. Once the client devices receive the configuration file, for example, with parameters uc, bc for the Laplacian noise.
[58] In some embodiments, a percentage of logging data may be sampled in a non- differentially-private manner to assess whether the parameters for a given metric remain correct. For example, typical data at a time of initial processing may be as illustrated in Table 3 below:
Table 3
[59] However, the underlying software may have changed and logging data collected three months later may correspond to:
Table 4
[60] In such an instance, the non-private sample group would enable a periodic repetition of a derivation of parameters uc, bc for the Laplacian noise. In the event the updated statistics (updated parameters uc, bc) for the adjusted logging data are determined to be substantively similar to the prior statistics (prior parameters uc, bc) corresponding to the prior logging data, no further action may be needed. However, in the event the updated statistics (updated parameters uc, bc) for the adjusted logging data are determined to be substantively dissimilar to the prior statistics (prior parameters uc, bc) corresponding to the prior logging data, then the updated parameters uc, bc may be used to generate an updated configuration file.
[61] Generally speaking, a user may be provided with controls allowing the user to make an election as to both if and when systems, programs, or features described herein may enable collection of logging data, and if the user is sent content or communications from a server. In addition, certain data may be treated in one or more ways before it is stored or used, so that personal data is removed, secured, encrypted, and so forth. For example, a purpose of differential privacy is to enable a user’s identity to be treated in a manner so that no user data can be determined for the user, or a user’s geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined. Thus, the user may have control over what logging data is collected, how that logging data is used, what logging data is stored (e.g., on the user device, the server, etc.), and what logging data is provided to the user. Also, for example, the user may have an ability to delete or modify any user information.
Example Data Network
[62] FIG. 3 depicts a distributed computing architecture 300, in accordance with example embodiments. Distributed computing architecture 300 includes server devices 308, 310 that are configured to communicate, via network 306, with programmable devices 304a, 304b, 304c, 304d, 304e. Network 306 may correspond to a local area network (LAN), a wide area network (WAN), a WLAN, a WWAN, a corporate intranet, the public Internet, or any other type of network configured to provide a communications path between networked computing devices.
Network 306 may also correspond to a combination of one or more LANs, WANs, corporate intranets, and/or the public Internet.
[63] Although FIG. 3 only shows five programmable devices, distributed application architectures may serve tens, hundreds, or thousands of programmable devices. Moreover, programmable devices 304a, 304b, 304c, 304d, 304e (or any additional programmable devices) may be any sort of computing device, such as a mobile computing device, desktop computer, wearable computing device, head-mountable device (HMD), network terminal, a mobile computing device, and so on. In some examples, such as illustrated by programmable devices 304a, 304b, 304c, 304e, programmable devices can be directly connected to network 306. In other examples, such as illustrated by programmable device 304d, programmable devices can be indirectly connected to network 306 via an associated computing device, such as programmable device 304c. In this example, programmable device 304c can act as an associated computing device to pass electronic communications between programmable device 304d and network 306. In other examples, such as illustrated by programmable device 304e, a computing device can be part of and/or inside a vehicle, such as a car, a truck, a bus, a boat or ship, an airplane, etc. In other examples not shown in FIG. 3, a programmable device can be both directly and indirectly connected to network 306.
[64] Server devices 308, 310 can be configured to perform one or more services, as requested by programmable devices 304a-304e. For example, server device 308 and/or 310 can provide content to programmable devices 304a-304e. The content can include, but is not limited to, web pages, hypertext, scripts, binary data such as compiled software, images, audio, and/or video. The content can include compressed and/or uncompressed content. The content can be encrypted and/or unencrypted. Other types of content are possible as well.
[65] As another example, server devices 308 and/or 310 can provide programmable devices 304a-304e with access to software for database, search, computation, graphical, audio, video, World Wide Web/Internet utilization, and/or other functions. Many other examples of server devices are possible as well.
Computing Device Architecture
[66] FIG. 4 is a block diagram of an example computing device 400, in accordance with example embodiments. In particular, computing device 400 shown in FIG. 4 can be configured to perform at least one function of and/or related to method 500.
[67] Computing device 400 may include a user interface module 401, a network communications module 402, one or more processors 403, data storage 404, one or more camera(s) 418, one or more sensors 420, and power system 422, all of which may be linked together via a system bus, network, or other connection mechanism 405.
[68] User interface module 401 can be operable to send data to and/or receive data from external user input/output devices. For example, user interface module 401 can be configured to send and/or receive data to and/or from user input devices such as a touch screen, a computer mouse, a keyboard, a keypad, a touch pad, a trackball, a joystick, a voice recognition module, and/or other similar devices. User interface module 401 can also be configured to provide output to user display devices, such as one or more cathode ray tubes (CRT), liquid crystal displays, light emitting diodes (LEDs), displays using digital light processing (DLP) technology, printers, light bulbs, and/or other similar devices, either now known or later developed. User interface module 401 can also be configured to generate audible outputs, with devices such as a speaker, speaker jack, audio output port, audio output device, earphones, and/or other similar devices. User interface module 401 can further be configured with one or more haptic devices that can generate haptic outputs, such as vibrations and/or other outputs detectable by touch and/or physical contact with computing device 400. In some examples, user interface module 401 can be used to provide a graphical user interface (GUI) for utilizing computing device 400.
[69] Network communications module 402 can include one or more devices that provide wireless interface(s) 407 and/or wireline interface(s) 408 that are configurable to communicate via a network. Wireless interface(s) 407 can include one or more wireless transmitters, receivers, and/or transceivers, such as a Bluetooth™ transceiver, a Zigbee® transceiver, a WiFi™ transceiver, a WiMAX™ transceiver, an LTE™ transceiver, and/or other type of wireless transceiver configurable to communicate via a wireless network. Wireline interface(s) 408 can include one or more wireline transmitters, receivers, and/or transceivers, such as an Ethernet transceiver, a Universal Serial Bus (USB) transceiver, or similar transceiver configurable to communicate via a twisted pair wire, a coaxial cable, a fiber-optic link, or a similar physical connection to a wireline network.
[70] In some examples, network communications module 402 can be configured to provide reliable, secured, and/or authenticated communications. For each communication described
herein, information for facilitating reliable communications (e.g., guaranteed message delivery) can be provided, perhaps as part of a message header and/or footer (e.g., packet/message sequencing information, encapsulation headers and/or footers, size/time information, and transmission verification information such as cyclic redundancy check (CRC) and/or parity check values). Communications can be made secure (e.g., be encoded or encrypted) and/or decry pted/decoded using one or more cryptographic protocols and/or algorithms, such as, but not limited to, Data Encryption Standard (DES), Advanced Encryption Standard (AES), a Rivest-Shamir-Adelman (RSA) algorithm, a Diffie-Hellman algorithm, a secure sockets protocol such as Secure Sockets Layer (SSL) or Transport Layer Security (TLS), and/or Digital Signature Algorithm (DSA). Other cryptographic protocols and/or algorithms can be used as well or in addition to those listed herein to secure (and then decry pt/decode) communications.
[71] One or more processors 403 can include one or more general purpose processors, and/or one or more special purpose processors (e.g., digital signal processors, tensor processing units (TPUs), graphics processing units (GPUs), application specific integrated circuits, etc.). One or more processors 403 can be configured to execute computer-readable instructions 406 that are contained in data storage 404 and/or other instructions as described herein.
[72] Data storage 404 can include one or more non-transitory computer-readable storage media that can be read and/or accessed by at least one of one or more processors 403. The one or more computer-readable storage media can include volatile and/or non-volatile storage components, such as optical, magnetic, organic or other memory or disc storage, which can be integrated in whole or in part with at least one of one or more processors 403. In some examples, data storage 404 can be implemented using a single physical device (e.g., one optical, magnetic, organic or other memory or disc storage unit), while in other examples, data storage 404 can be implemented using two or more physical devices.
[73] Data storage 404 can include computer-readable instructions 406 and perhaps additional data. In some examples, data storage 404 can include storage required to perform at least part of the herein-described methods, scenarios, and techniques and/or at least part of the functionality of the herein-described devices and networks. In some examples, data storage 404 can include storage for a logging data module 412 (e.g., to collect and store differentially private logging data based on a configuration file). In particular of these examples, computer-
readable instructions 406 can include instructions that, when executed by the one or more processors 403, enable computing device 400 to provide for some or all of the functionality of logging data module 412.
[74] In some examples, computing device 400 can include camera(s) 418. Camera(s) 418 can include one or more image capture devices, such as still and/or video cameras, equipped to capture light and record the captured light in one or more images; that is, camera(s) 418 can generate image(s) of captured light. The one or more images can be one or more still images and/or one or more images utilized in video imagery. Camera(s) 418 can capture light and/or electromagnetic radiation emitted as visible light, infrared radiation, ultraviolet light, and/or as one or more other frequencies of light.
[75] In some examples, computing device 400 can include one or more sensors 420. Sensors 420 can be configured to measure conditions within computing device 400 and/or conditions in an environment of computing device 400 and provide data about these conditions. For example, sensors 420 can include one or more of: (i) sensors for obtaining data about computing device 400, such as, but not limited to, a thermometer for measuring a temperature of computing device 400, a battery sensor for measuring power of one or more batteries of power system 422, and/or other sensors measuring conditions of computing device 400; (ii) an identification sensor to identify other objects and/or devices, such as, but not limited to, a Radio Frequency Identification (RFID) reader, proximity sensor, one-dimensional barcode reader, two-dimensional barcode (e.g., Quick Response (QR) code) reader, and a laser tracker, where the identification sensors can be configured to read identifiers, such as RFID tags, barcodes, QR codes, and/or other devices and/or object configured to be read and provide at least identifying information; (iii) sensors to measure locations and/or movements of computing device 400, such as, but not limited to, a tilt sensor, a gyroscope, an accelerometer, a Doppler sensor, a GPS device, a sonar sensor, a radar device, a laser-displacement sensor, and a compass; (iv) an environmental sensor to obtain data indicative of an environment of computing device 400, such as, but not limited to, an infrared sensor, an optical sensor, a light sensor, a biosensor, a capacitive sensor, a touch sensor, a temperature sensor, a wireless sensor, a radio sensor, a movement sensor, a microphone, a sound sensor, an ultrasound sensor and/or a smoke sensor; and/or (v) a force sensor to measure one or more forces (e.g., inertial forces and/or G-forces) acting about computing device 400, such as, but not limited to one or more
sensors that measure: forces in one or more dimensions, torque, ground force, friction, and/or a zero moment point (ZMP) sensor that identifies ZMPs and/or locations of the ZMPs. Many other examples of sensors 420 are possible as well.
[76] Power system 422 can include one or more batteries 424 and/or one or more external power interfaces 426 for providing electrical power to computing device 400. Each battery of the one or more batteries 424 can, when electrically coupled to the computing device 400, act as a source of stored electrical power for computing device 400. One or more batteries 424 of power system 422 can be configured to be portable. Some or all of one or more batteries 424 can be readily removable from computing device 400. In other examples, some or all of one or more batteries 424 can be internal to computing device 400, and so may not be readily removable from computing device 400. Some or all of one or more batteries 424 can be rechargeable. For example, a rechargeable battery can be recharged via a wired connection between the battery and another power supply, such as by one or more power supplies that are external to computing device 400 and connected to computing device 400 via the one or more external power interfaces. In other examples, some or all of one or more batteries 424 can be non-rechargeable batteries.
[77] One or more external power interfaces 426 of power system 422 can include one or more wired-power interfaces, such as a USB cable and/or a power cord, that enable wired electrical power connections to one or more power supplies that are external to computing device 400. One or more external power interfaces 426 can include one or more wireless power interfaces, such as a Qi wireless charger, that enable wireless electrical power connections, such as via a Qi wireless charger, to one or more external power supplies. Once an electrical power connection is established to an external power source using one or more external power interfaces 426, computing device 400 can draw electrical power from the external power source the established electrical power connection. In some examples, power system 422 can include related sensors, such as battery sensors associated with one or more batteries or other types of electrical power sensors.
Example Methods of Operation
[78] FIG. 5 is a flowchart of a method 500, in accordance with example embodiments. Method 500 can be executed by a computing device, such as computing device 400.
[79] Method 500 can begin at block 510, where the method involves receiving, from a plurality of client devices, logging data associated with a software application, wherein the software application is installed in the plurality of client devices, and wherein the logging data is based on an initial differential privacy parameter.
[80] At block 520, the method further involves adjusting the initial differential privacy parameter based on an analysis of the received logging data.
[81] At block 530, the method also involves generating a configuration file comprising the adjusted differential privacy parameter.
[82] At block 540, the method additionally involves sending the configuration file to the plurality of client devices, wherein the sending of the configuration file causes the plurality of client devices to collect modified logging data.
[83] Some embodiments involve receiving, from the plurality of client devices, the modified logging data associated with the software application. Such embodiments also involve readjusting the adjusted differential privacy parameter based on another analysis of the received modified logging data. Such embodiments further involve generating a second configuration file comprising the adjusted differential privacy parameter as readjusted. Such embodiments additionally involve sending the second configuration file to the plurality of client devices.
[84] In some embodiments, the analysis of the received logging data involves determining structured information related to the logging data. Such embodiments also involve determining shape information of the logging data. The adjusting of the initial differential privacy parameter may be based on the structured information and the shape information.
[85] In some embodiments, the logging data may include a usage count. The adjusted differential privacy parameter may include a threshold value for a logging of the usage count.
[86] In some embodiments, the received logging data may be arranged in a tabular format comprising a first plurality of rows and a second plurality of columns. Each row of the first plurality of rows may correspond to a client device of the plurality of client devices. The adjusting of the initial differential privacy parameter may be performed based on aggregated statistics associated with one or more of the second plurality of columns.
[87] In some embodiments, the adjusting of the initial differential privacy parameter may be performed by determining an amount of noise to be added to the logging data based on the
aggregated statistics associated with the one or more columns of the second plurality of columns.
[88] In some embodiments, the determining of the amount of noise may involve determining a Laplacian noise for the one or more columns of the second plurality of columns.
[89] In some embodiments, the Laplacian noise for a given column may be based on a mean and standard deviation of the values in the given column.
[90] In some embodiments, the sending of the configuration file to the plurality of client devices may include sending the configuration file to a first sub plurality of the plurality of client devices, and not sending the configuration file to a second sub plurality of the plurality of client devices.
[91] In some embodiments, the adjusting of the initial differential privacy parameter may involve determining an amount of noise to be added to the logging data.
[92] In some embodiments, the adjusting of the initial differential privacy parameter may involve determining a contribution bounding to be added to the logging data.
[93] In some embodiments, the adjusting of the initial differential privacy parameter may involve determining a partition selection for the logging data.
[94] The present disclosure is not to be limited in terms of the particular embodiments described in this application, which are intended as illustrations of various aspects. Many modifications and variations can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the disclosure, in addition to those enumerated herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the appended claims.
[95] The above detailed description describes various features and functions of the disclosed systems, devices, and methods with reference to the accompanying figures. In the figures, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, figures, and claims are not meant to be limiting. Other embodiments can be utilized, and other changes can be made, without departing from the spirit or scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and
illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.
[96] With respect to any or all of the ladder diagrams, scenarios, and flow charts in the figures and as discussed herein, each block and/or communication may represent a processing of information and/or a transmission of information in accordance with example embodiments. Alternative embodiments are included within the scope of these example embodiments. In these alternative embodiments, for example, functions described as blocks, transmissions, communications, requests, responses, and/or messages may be executed out of order from that shown or discussed, including substantially concurrent or in reverse order, depending on the functionality involved. Further, more or fewer blocks and/or functions may be used with any of the ladder diagrams, scenarios, and flow charts discussed herein, and these ladder diagrams, scenarios, and flow charts may be combined with one another, in part or in whole.
[97] A block that represents a processing of information may correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively or additionally, a block that represents a processing of information may correspond to a module, a segment, or a portion of program code (including related data). The program code may include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique. The program code and/or related data may be stored on any type of computer readable medium such as a storage device including a disk or hard drive or other storage medium.
[98] The computer readable medium may also include non-transitory computer readable media such as non-transitory computer-readable media that stores data for short periods of time like register memory, processor cache, and random access memory (RAM). The computer readable media may also include non-transitory computer readable media that stores program code and/or data for longer periods of time, such as secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example. The computer readable media may also be any other volatile or nonvolatile storage systems. A computer readable medium may be considered a computer readable storage medium, for example, or a tangible storage device.
[99] Moreover, a block that represents one or more information transmissions may correspond to information transmissions between software and/or hardware modules in the
same physical device. However, other information transmissions may be between software modules and/or hardware modules in different physical devices.
[100] With respect to embodiments that include determining a non-common term based on user interaction with a computing device, and/or determining mistranscribed terms using a machine learning model, or interactions by the computing device with cloud-based servers, a user may be provided with controls allowing the user to make an election as to both if and when systems, programs, or features described herein may enable collection of user information (e.g. , information about a user’s social network, social actions, or activities, profession, a user’s preferences, a user’s demographic information, a user’s current location, or other personal information), and if the user is sent content or communications from a server. In addition, certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information is removed. For example, a user’s identity may be treated so that no personally identifiable information can be determined for the user, or a user’s geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined. Thus, the user may have control over what information is collected about the user, how that information is used, and what information is provided to the user.
[101] While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are provided for explanatory purposes and are not intended to be limiting, with the true scope being indicated by the following claims.
Claims
1. A computer-implemented method, comprising: receiving, from a plurality of client devices, logging data associated with a software application, wherein the software application is installed in the plurality of client devices, and wherein the logging data is based on an initial differential privacy parameter; adjusting the initial differential privacy parameter based on an analysis of the received logging data; generating a configuration file comprising the adjusted differential privacy parameter; and sending the configuration file to the plurality of client devices, wherein the sending of the configuration file causes the plurality of client devices to collect modified logging data.
2. The computer-implemented method of claim 1, further comprising: receiving, from the plurality of client devices, the modified logging data associated with the software application; readjusting the adjusted differential privacy parameter based on another analysis of the received modified logging data; generating a second configuration file comprising the adjusted differential privacy parameter as readjusted; and sending the second configuration file to the plurality of client devices.
3. The computer-implemented method of claim 1, wherein the analysis of the received logging data comprises: determining structured information related to the logging data; determining shape information of the logging data, and wherein the adjusting of the initial differential privacy parameter is based on the structured information and the shape information.
4. The computer-implemented method of claim 1, wherein the logging data comprises a usage count, and wherein the adjusted differential privacy parameter comprises a threshold value for a logging of the usage count.
5. The computer-implemented method of claim 1, wherein the received logging data is arranged in a tabular format comprising a first plurality of rows and a second plurality of columns, and wherein each row of the first plurality of rows corresponds to a client device of the plurality of client devices, and wherein the adjusting of the initial differential privacy parameter is performed based on aggregated statistics associated with one or more columns of the second plurality of columns.
6. The computer-implemented method of claim 5, wherein the adjusting of the initial differential privacy parameter is performed by determining an amount of noise to be added to the logging data based on the aggregated statistics associated with the one or more columns of the second plurality of columns.
7. The computer-implemented method of claim 6, wherein the determining of the amount of noise comprises determining a Laplacian noise for the one or more columns of the second plurality of columns.
8. The computer-implemented method of claim 7, wherein the Laplacian noise for a given column is based on a mean and standard deviation of the values in the given column.
9. The computer-implemented method of claim 1, wherein the sending of the configuration file to the plurality of client devices comprises sending the configuration file to a first sub plurality of the plurality of client devices, and not sending the configuration file to a second sub plurality of the plurality of client devices.
10. The computer-implemented method of claim 1, wherein the adjusting of the initial differential privacy parameter comprises determining an amount of noise to be added to the logging data.
11. The computer-implemented method of claim 1, wherein the adjusting of the initial differential privacy parameter comprises determining a contribution bounding to be added to the logging data.
12. The computer-implemented method of claim 1, wherein the adjusting of the initial differential privacy parameter comprises determining a partition selection for the logging data.
13. A computing device, comprising: one or more processors; and data storage, wherein the data storage has stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing device to carry out functions comprising: receiving, from a plurality of client devices, logging data associated with a software application, wherein the software application is installed in the plurality of client devices, and wherein the logging data is based on an initial differential privacy parameter; adjusting the initial differential privacy parameter based on an analysis of the received logging data; generating a configuration file comprising the adjusted differential privacy parameter; and sending the configuration file to the plurality of client devices, wherein the sending of the configuration file causes the plurality of client devices to collect modified logging data.
14. The computing device of claim 13, wherein the functions further comprise: receiving, from the plurality of client devices, the modified logging data associated with the software application; readjusting the adjusted differential privacy parameter based on another analysis of the received modified logging data;
generating a second configuration file comprising the readjusted differential privacy parameter; and sending the second configuration file to the plurality of client devices.
15. The computing device of claim 13, wherein the functions comprising the analysis of the received logging data further comprise: determining structured information related to the logging data; determining shape information of the logging data, and wherein the adjusting of the initial differential privacy parameter is based on the structured information and the shape information.
16. The computing device of claim 13, wherein the logging data comprises a usage count, and wherein the adjusted differential privacy parameter comprises a threshold value for a logging of the usage count.
17. The computing device of claim 13, wherein the received logging data is arranged in a tabular format comprising a first plurality of rows and a second plurality of columns, and wherein each row of the first plurality of rows corresponds to a client device of the plurality of client devices, and wherein the adjusting of the initial differential privacy parameter is performed based on aggregated statistics associated with one or more of the second plurality of columns.
18. The computing device of claim 13, wherein the functions comprising the sending of the configuration file to the plurality of client devices comprise sending the configuration file to a first sub plurality of the plurality of client devices, and not sending the configuration file to a second sub plurality of the plurality of client devices.
19. The computing device of claim 13, wherein the adjusting of the initial differential privacy parameter comprises determining an amount of noise to be added to the logging data.
20. The computing device of claim 13, wherein the adjusting of the initial differential privacy parameter comprises determining a contribution bounding to be added to the logging data.
21. The computing device of claim 13, wherein the adjusting of the initial differential privacy parameter comprises determining a partition selection for the logging data.
22. An article of manufacture comprising one or more non-transitory computer readable media having computer-readable instructions stored thereon that, when executed by one or more processors of a computing device, cause the computing device to carry out functions comprising: receiving, from a plurality of client devices, logging data associated with a software application, wherein the software application is installed in the plurality of client devices, and wherein the logging data is based on an initial differential privacy parameter; adjusting the initial differential privacy parameter based on an analysis of the received logging data; generating a configuration file comprising the adjusted differential privacy parameter; and sending the configuration file to the plurality of client devices, wherein the sending of the configuration file causes the plurality of client devices to collect modified logging data.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2024/021497 WO2025207082A1 (en) | 2024-03-26 | 2024-03-26 | Systems and methods for an automatic differential privacy policy |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2024/021497 WO2025207082A1 (en) | 2024-03-26 | 2024-03-26 | Systems and methods for an automatic differential privacy policy |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025207082A1 true WO2025207082A1 (en) | 2025-10-02 |
Family
ID=90826292
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/021497 Pending WO2025207082A1 (en) | 2024-03-26 | 2024-03-26 | Systems and methods for an automatic differential privacy policy |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025207082A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180322279A1 (en) * | 2017-05-02 | 2018-11-08 | Sap Se | Providing differentially private data with causality preservation |
| EP3420493B1 (en) * | 2016-04-07 | 2020-09-16 | Samsung Electronics Co., Ltd. | Private data aggregation framework for untrusted servers |
| US20230319110A1 (en) * | 2022-03-30 | 2023-10-05 | Microsoft Technology Licensing, Llc | Event-level data privacy for streaming post analytics data |
-
2024
- 2024-03-26 WO PCT/US2024/021497 patent/WO2025207082A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3420493B1 (en) * | 2016-04-07 | 2020-09-16 | Samsung Electronics Co., Ltd. | Private data aggregation framework for untrusted servers |
| US20180322279A1 (en) * | 2017-05-02 | 2018-11-08 | Sap Se | Providing differentially private data with causality preservation |
| US20230319110A1 (en) * | 2022-03-30 | 2023-10-05 | Microsoft Technology Licensing, Llc | Event-level data privacy for streaming post analytics data |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20230351026A1 (en) | Security model utilizing multi-channel data with risk-entity facing cybersecurity alert engine and portal | |
| US20250023904A1 (en) | Security model utilizing multi-channel data | |
| US12015630B1 (en) | Security model utilizing multi-channel data with vulnerability remediation circuitry | |
| US20220014556A1 (en) | Cybersecurity profiling and rating using active and passive external reconnaissance | |
| US11533330B2 (en) | Determining risk metrics for access requests in network environments using multivariate modeling | |
| US20210360032A1 (en) | Cybersecurity risk analysis and anomaly detection using active and passive external reconnaissance | |
| EP3607695B1 (en) | Differential privacy using a count mean sketch | |
| US12206707B2 (en) | Rating organization cybersecurity using probe-based network reconnaissance techniques | |
| US11270215B2 (en) | Intelligent recommendations | |
| US12063229B1 (en) | System and method for associating cybersecurity intelligence to cyberthreat actors through a similarity matrix | |
| US12229311B2 (en) | Identifying sensitive data risks in cloud-based enterprise deployments based on graph analytics | |
| US11108835B2 (en) | Anomaly detection for streaming data | |
| US11604921B2 (en) | Systems and methods for autofill field classification | |
| US20220060523A1 (en) | Inter-Application Data Interchange Via a Group-Based Communication System That Triggers User Intervention | |
| US20180007049A1 (en) | Computerized systems and methods for authenticating users on a network device via dynamically allocated authenticating state machines hosted on a computer network | |
| US11727144B2 (en) | System and method for protecting identifiable user data | |
| CN107005576A (en) | Generate bridge match identifiers for link identifiers from server logs | |
| WO2025207082A1 (en) | Systems and methods for an automatic differential privacy policy | |
| US20230239314A1 (en) | Risk management security system | |
| JP6845344B2 (en) | Data leakage risk assessment | |
| US20190377805A1 (en) | User Interface for Shared Documents | |
| KR20250055164A (en) | Method and system for protecting privacy by homomorphically encrypting sql statements | |
| WO2022047571A1 (en) | Dynamic analysis and monitoring machine learning processes | |
| US20250335459A1 (en) | External data source data orchestration | |
| US12341816B1 (en) | Security model utilizing multi-channel data with service level agreement integration |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24720980 Country of ref document: EP Kind code of ref document: A1 |