WO2025072505A1 - Using large language models to generate user interface components - Google Patents
Using large language models to generate user interface components Download PDFInfo
- Publication number
- WO2025072505A1 WO2025072505A1 PCT/US2024/048636 US2024048636W WO2025072505A1 WO 2025072505 A1 WO2025072505 A1 WO 2025072505A1 US 2024048636 W US2024048636 W US 2024048636W WO 2025072505 A1 WO2025072505 A1 WO 2025072505A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- instructions
- computing system
- examples
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/451—Execution arrangements for user interfaces
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/04817—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance using icons
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/0482—Interaction with lists of selectable items, e.g. menus
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/038—Indexing scheme relating to G06F3/038
- G06F2203/0381—Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
- G06F3/04883—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
Definitions
- a remote computing device e.g., a smartphone
- a remote computing device may include one or more applications with a plurality of functions that may be statically defined or predefined at compile time and do not change during execution.
- a computing system in communication with the computing device may retrieve, with explicit user consent, and using an application programming interface, information associated with the plurality of functions.
- the computing system may also receive an indication of a natural language user input (e.g., audio or text input from a user operating the remote computing device) associated with the plurality of functions.
- the computing system may receive an indication of a voice input that includes multiple commands and/or user intents associated with one or more applications, such as, e.g., “Send message to Jenny to arrange childcare, book doctor’s appointment for Jane, schedule the meeting with John, order dinner, and call the electrician.”
- the computing system may apply a machine learning (e.g., a large language model) to the natural language user input to generate a set of instructions, e.g., new code, that provides corresponding user interfaces, graphical components (e.g., widgets) and/or a user’s desired application functionality.
- a machine learning e.g., a large language model
- the computing system may generate instructions for displaying a new graphical component (e.g., a widget) that includes information pertaining to the user’s current balance.
- the computing system may apply the machine learning model to the natural language user input to identify one or more tasks, in which each task from the one or more tasks is associated with a respective category from one or more categories. For example, the machine learning model may identify “Send message to Jenny to arrange childcare,” and “book doctor’s appointment for Jane” as tasks associated with a “Family” category, and may identify “schedule the meeting with John” as a task associated with a “Work” category.
- the computing system may apply the machine learning model to the identified tasks to generate a set of instructions that provides corresponding graphical user interfaces, graphical components, and/or and application functionality for completing the identified tasks.
- the disclosure is directed toward a method that includes retrieving, by a computing system, information associated with a plurality of functions included in one or more applications, and receiving, by the computing system, an indication of a natural language user input associated with the plurality of functions included in the one or more applications.
- the method further includes applying, by the computing system, and using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
- the disclosure is directed toward a computing system comprising one or more processors, and one or more storage devices that store instructions.
- the instructions when executed by the one or more processors, cause the one or more processors to retrieve information associated with a plurality of functions included in one or more applications, and receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications.
- the instructions further cause the one or more processors to apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
- the disclosure is directed toward a non-transitory computer- readable storage medium encoded with instructions.
- the instructions when executed by one or more processors of a computing device, cause the one or more processors to retrieve information associated with a plurality of functions included in one or more applications, and receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications.
- the instructions further cause the one or more processors to apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
- the disclosure is directed toward a computer program product for generating custom graphical components for one or more applications, the computer program product comprising one or more instructions.
- the one or more instructions when executed by at least one processor, cause the at least one processor to retrieve information associated with a plurality of functions included in one or more applications, and receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications.
- the one or more instructions further cause the at least one processor to apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
- FIG. 1 is a conceptual diagram illustrating an example computing system for dynamically generating custom graphical user interfaces for performing one or more tasks identified in natural language input, in accordance with one or more techniques of this disclosure.
- FIG. 2 is a block diagram illustrating another example computing system configured to apply a machine learning module to natural language input to dynamically generate custom graphical user interfaces, in accordance with one or more techniques of this disclosure.
- FIG. 3 A is a conceptual diagram illustrating an example training process for a machine learning module, in accordance with one or more techniques of this disclosure.
- FIG. 3B is a conceptual diagram illustrating an example trained machine learning module, in accordance with one or more techniques of this disclosure.
- FIG. 3C is a conceptual diagram illustrating a machine learning module configured to apply a large language model that accepts natural language input and provides code for corresponding graphical user interfaces and application functionality as output, in accordance with one or more techniques of this disclosure.
- FIG. 4 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- FIG. 5 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- FIG. 6 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- FIG. 7A is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- FIG. 7B is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- FIG. 8 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- FIG. 9 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality to a companion device, in accordance with one or more techniques of this disclosure.
- FIG. 10 is a flowchart illustrating an example operation for dynamically generating custom graphical user interfaces for one or more applications, in accordance with one or more techniques of this disclosure.
- FIG. 11 is a flowchart illustrating another example operation for dynamically generating custom graphical user interfaces for one or more applications, in accordance with one or more techniques of this disclosure.
- FIG. 1 is a conceptual diagram illustrating an example computing system for dynamically generating custom graphical user interfaces for performing one or more tasks identified in natural language input, in accordance with one or more techniques of this disclosure.
- a user 120 interacts with computing device 112 that is in communication with computing system 100.
- computing device 112 In some examples, some or all of the components and/or functionality attributed to computing system 100 may be implemented or performed by computing device 112.
- computing system 100 may be implemented on a plurality of computing devices that may include, but are not limited to, portable, mobile, or other devices, such as mobile phones (including smartphones), laptop computers, desktop computers, tablet computers, smart television platforms, server computers, mainframes, etc.
- computing system 100 may represent a cloud computing system that provides one or more services via network 101. That is, in some examples, computing system 100 may be a distributed computing system.
- Network 101 may include any public or private communication network, such as a cellular network, Wi-Fi network, a direct cell-to-satellite communication network, or other type of network for transmitting data between computing system 100 and computing device 112.
- network 101 may represent one or more packet switched networks, such as the Internet.
- Computing device 112 may send and receive data to and from computing system 100 across network 101 using any suitable communication techniques.
- computing system 100 and computing device 112 may each be operatively coupled to network 101 using respective network links.
- Network 101 may include network hubs, network switches, network routers, etc., that are operatively inter-coupled thereby providing for the exchange of information between computing device 112 and computing system 100.
- network links of network 101 may be Ethernet, ATM or other network connections. Such connections may include wireless and/or wired connections.
- computing device 112 includes one or more user interface (UI) components (“UI components 102”).
- UI components 102 of computing device 112 may be configured to function as input devices and/or output devices for computing device 112.
- UI components 102 may be implemented using various technologies.
- UI components 102 may be configured to receive input from user 120 through tactile, audio, and/or video feedback.
- input devices include a presence-sensitive display, a presence-sensitive or touch-sensitive input device (such as that shown in FIG. 1), a mouse, a keyboard, a voice responsive system, video camera, microphone or any other type of device for detecting a command from user 120.
- a presence-sensitive display includes a touch-sensitive or presence-sensitive input screen, such as a resistive touchscreen, a surface acoustic wave touchscreen, a capacitive touchscreen, a projective capacitive touchscreen, a pressure sensitive screen, an acoustic pulse recognition touch screen, or another presence-sensitive technology.
- UI components 102 of computing device 112 may include a presence-sensitive device that may receive tactile input from user 120.
- UI components 102 may receive indications of the tactile input by detecting one or more gestures from user 120 (e.g., when user 120 touches or points to one or more locations of UI components 102 with a finger or a stylus pen).
- UI components 102 may additionally or alternatively be configured to function as an output device by providing output to user 120 using tactile, audio, or video stimuli.
- output devices include a sound card, a video graphics adapter card, or any of one or more display devices, such as a liquid crystal display (LCD), dot matrix display, light emitting diode (LED) display, microLED, miniLED, organic light-emitting diode (OLED) display, e- ink, or similar monochrome or color display capable of outputting visible information to user 120.
- display devices such as a liquid crystal display (LCD), dot matrix display, light emitting diode (LED) display, microLED, miniLED, organic light-emitting diode (OLED) display, e- ink, or similar monochrome or color display capable of outputting visible information to user 120.
- Additional examples of an output device include a speaker, a haptic device, or other device that can generate intelligible output to a user.
- UI components 102 may present output to user 120 as a graphical user interface that may be associated with functionality provided by computing device 112.
- UI components 102 may present various user interfaces of applications executing at or accessible by computing device 112 (e.g., an electronic message application, an Internet browser application, etc.).
- User 120 may interact with a respective user interface of an application to cause computing device 112 to perform operations relating to a function provided by the application.
- UI components 102 of computing device 112 may detect two- dimensional and/or three-dimensional gestures as input from user 120. For instance, a sensor of UI components 102 may detect the user's movement (e.g., moving a hand, an arm, a pen, a stylus, etc.) within a threshold distance of the sensor of UI components 102. UI components 102 may determine a two- or three-dimensional vector representation of the movement and correlate the vector representation to a gesture input (e.g., a hand-wave, a pinch, a clap, a pen stroke, etc.) that has multiple dimensions.
- a gesture input e.g., a hand-wave, a pinch, a clap, a pen stroke, etc.
- UI components 102 may, in some examples, detect a multidimensional gesture without requiring the user to gesture at or near a screen or surface at which UI components 102 output information for display. Instead, UI components 102 may detect a multi-dimensional gesture performed at or near a sensor which may or may not be located near the screen or surface at which UI components 102 output information for display.
- computing system 100 includes user interface (UI) module 104.
- Module 104 may perform operations described herein using hardware, software, firmware, or a mixture thereof residing in and/or executing at computing system 100.
- Computing system 100 may execute module 104 with one processor or with multiple processors.
- computing system 100 may execute module 104 as a virtual machine executing on underlying hardware.
- Module 104 may execute as one or more services of an operating system or computing platform or may execute as one or more executable programs at an application layer of a computing platform.
- UI module 104 may be operable by computing system 100 to perform one or more functions, such as receive input and send indications of such input to other components associated with computing system 100.
- UI module 104 may also receive data from components associated with computing system 100. Using the data received, UI module 104 may cause other components associated with computing system 100, such as UI components 102, to provide output based on the data. For instance, UI module 104 may send data to UI components 102 of computing device 112 to display a graphical user interface (GUI), such as GUI 116.
- GUI graphical user interface
- user 120 may be provided with an opportunity to provide input to control whether programs or features of computing device 112 and/or computing system 100 can collect and make use of user information (e.g., user 120’s personal data, information about user 114’s current location, location history, activity, etc.), or to dictate whether and/or how computing device 112 and/or computing system 100 may receive content that may be relevant to user 120.
- user information may include data that includes the context of user usage, either obtained from an application itself or from other sources. Examples of usage context may include breadth of share (sharing publicly, or with a large group, or privately, or a specific person), context of share, etc.
- additional data can include the state of the device, e.g., the location of the device, the apps running on the device, etc.
- certain data may be treated in one or more ways before it is stored or used by computing device 112 and/or computing system 100 so that personally identifiable information is removed.
- a user’s identity may be treated so that no personally identifiable information can be determined about the user, or a user’s geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined.
- user 120 may have control over how information is collected about them and used by computing device 112 and/or computing system 100.
- user 120 may be prompted by computing device 112 to provide explicit consent for computing device 112 and/or computing system 100 to retrieve and/or store any or all of user 120’ s data.
- an action log executed on computing device 112 may provide user 120 a ledger of activity, which may show any automations or applications running in the background of computing device 112, as well as an accurate log of all UI generator module 108 activity.
- GUI 116 may be an example representation of a mobile phone home screen.
- GUI 116 may include a plurality of user interface elements.
- GUI 116 includes user interface elements 115A-1151, which may be referred to as “widgets”.
- a widget may be a smaller GUI or GUI element that provides specific functionality or access to a larger application.
- GUI 116 includes widgets 115A-1151, which may provide user 120 access to one or more applications.
- widget 115A may be a widget for a messaging application, in which, responsive to user 120 clicking on widget 115A, computing device 112 may open the messaging application for user 120.
- Widget 115B may be a widget for a banking application.
- Widget 115C may be a widget for a social media application, and widget 115D may be a widget for an Internet browser.
- computing device 112 may include one or more applications, which may be accessed via one or more widgets displayed on GUI 116.
- the “plurality of functions” described herein may be functions, or “functionality”, e.g., capabilities or features of an application, that are provided by the values, settings, or other data that are directly embedded into the source code of an application, rather than those that are dynamically generated or configurable at runtime.
- the “plurality of functions” may include functionality provided by values, logic, etc. that are fixed, e.g., “hard- coded”, in an application’s source code, and cannot be easily changed without modifying the code itself.
- the “plurality of functions” may be considered statically defined functions, or functions that are predefined at compile time or build time and do not change during execution.
- the “information associated with a plurality of functions” described herein may refer to data that can be retrieved, e.g., via an API, from one or more applications installed on a computing device, such as computing device 112.
- an application may include an API that enables external applications or modules to interact with and use the data stored by the application.
- the “information associated with a plurality of functions included in one or more applications” may be defined as data associated with the predefined or statically defined functionality of the one or more applications, e.g., an API response.
- a banking application may include predefined or statically defined functionality for displaying a current balance of a user’s bank account.
- API module 106 may use the banking application API to retrieve the information associated with the plurality of functions, which may include, for example, a value for the current balance of the user’s bank account, but may not include all of the predefined or statically defined functionality or logic for determining and displaying the value for the current balance of the user’s bank account.
- the one or more applications may be considered to include a plurality of predefined functions.
- a calculator application may include predefined functionality for performing various arithmetic and mathematical operations
- a browser application may include predefined functionality for accessing and browsing the Internet
- a banking application may include predefined functionality for transferring funds, etc.
- many applications executed on computing devices may include predefined functionality for performing various tasks, such as responding to messages, scheduling appointments, booking reservations, browsing the Internet, etc.
- a user wants to book a dinner reservation, they may use a dining application to reserve a table at a particular restaurant.
- the user may also need to use a calendar application to determine what date and time they can book the reservation for, use a map application to find local restaurants, use a web browser application to find reviews for a restaurant, use a messaging application to determine if any friends would like to join the dinner reservation, etc.
- a user may have to navigate through multiple applications, which may be time-consuming and frustrating for a user.
- a user may find it difficult to complete tasks due to information being stored across multiple different applications, and due to information having the potential to change over time (e.g., a user may book a reservation at 7:00 PM, but later receive a message from a friend saying that time no longer works with their schedule).
- custom user interfaces and widgets that are dynamically generated based on identified tasks, in which the custom user interfaces may be organized into different categories, and the custom widgets may enable users to access desired functionality for performing the identified tasks.
- user 120 may simply say their intent or command, i.e., provide natural language input 114, and computing system 100 may provide instructions for generating the multiple organized user interfaces with widgets for completing the multiple tasks.
- computing system 100 may include a user interface generator module 108 that applies a large language model to natural language input in order to dynamically generate custom user interfaces and functionality for performing various tasks.
- user interface generator module 108 may retrieve, via API module 106, information (e.g., API response data) associated with the plurality of functions included in the one or more applications executing at computing device 112, such as applications associated with and/or accessed via widgets 115A-115I.
- user interface generator module 108 may run continuously and be configured to monitor the content of one or more applications and/or user activity.
- user interface generator module 108 may run continuously in the background of computing device 112 and be configured to monitor the content of one or more applications executing at computing device 112 and/or user activity within computing device 112.
- API 106 receives explicit consent from user 120 to gather information from user 120 and one or more applications executing on computing device 112 operated by user 120.
- user interface generator module 108 may receive an indication of a natural language user input 114 associated with one or more predefined or already available functions included in the one or more applications, again provided that user 120 has given explicit permission for computing system 100 to monitor/receive user 120’ s data.
- API module 106 which can be considered an API library, may include multiple APIs that can be used to access one or more application APIs.
- API module 106 may provide information about user interface elements, events, and actions to assistive technologies (e.g., screen readers, magnification gestures, switch devices, etc.) provided by computing system 100 or computing device 112.
- API module 106 may be configured to enable the exchanging of data in a standardized format.
- API module 106 may support REST (Representational State Transfer), which is a widely-used architectural style for building APIs that use HTTP (Hypertext Transfer Protocol) to exchange data between applications.
- REST Real State Transfer
- API module 106 may be configured to generate a stream of accessibility events as the user interacts with computing device 112 and applications executed on computing device 112.
- these events may represent actions and changes in a user interface, such as button presses, text changes, and screen transitions.
- user interface generator module 108 may receive and analyze these events to better understand how user 120 interacts with an application executing on computing device 112.
- API module 106 may be configured to retrieve accessibility actions from applications executed on computing device 112.
- “Accessibility actions” may refer to different types of inputs that can be detected at a location associated with a UI component 102, such as mechanical inputs (e.g., a clicking of a button, a swiping of a screen, etc.), audio input (e.g., verbal command), or gesture control (e.g., triple tapping on a screen, hand wave, assistive gestures, etc.).
- accessibility actions may provide users the ability to interact with an application or user interface element in multiple ways according to their needs.
- computing system 100 may determine which accessibility actions are frequently performed by user 120 when interacting with a GUI or application such that the new user interface generated by user interface generator module 108 can be better tailored for user 120’s needs.
- the information retrieved by API module 106 from computing device 112 may be stored by computing system 100 to identify potential accessibility issues and/or better understand how user 120 interacts with computing device 112.
- user interface generator module 108 may use information retrieved from computing device 112 to determine the format, size, color scheme, accessibility features, or any other features to include in the set of instructions (e.g., new code) for generating new graphical user interfaces, components, and functionality for performing tasks.
- user interface generator module 108 may also provide users the ability to configure various accessibility and/or display options according to their needs. For example, user 120 may be able to adjust the user interface elements of a GUI, such as text size, enable color correction, set up magnification gestures, and configure gesturebased navigation.
- user interface generator module 108 may send information (e.g., location information, other contextual information, etc.) to ML module 110 only if computing system 100 receives permission from the user of computing device 112 to send the information.
- information e.g., location information, other contextual information, etc.
- the user may be provided with an opportunity to control whether programs or features of computing system 100 can collect user information (e.g., information about a user’s social network, a user’s social actions or activities, a user’s profession, a user’s preferences, or a user’s current location), or to control whether and/or how computing system 100 and/or computing device 112 may store and share user information.
- certain data may be treated in one or more ways before it is stored, transmitted, or used so that personally identifiable information is removed.
- a user’s identity may be treated so that no personally identifiable information can be determined about the user.
- the user may have control over how information is collected about the user and stored, transmitted, and/or used in accordance with techniques of this disclosure.
- user interface generator module 108 may receive, from computing device 112, and provided that user 120 has given explicit consent, an indication of a natural language user input 114 (e.g., audio or text input from user 120) associated with the plurality of functions included in the one or more applications.
- the indication of a natural language user input may represent user 120’s command or intent, and/or desired functionality for one or more applications.
- natural language user input 114 may include a natural language utterance such as “Send money to Mike, book Jane’s appointment. .
- natural language user input 114 may represent user 120’s commands and/or desires for performing one or more tasks, such as transferring funds, booking an appointment, viewing their bank account balance, etc.
- user 120 may provide natural language input that represents any number of commands or intents. That is, user 120 may say aloud any number of tasks in a single utterance, which may include tasks pertaining to different functionality included in different applications.
- API module 106 may be configured to retrieve information (e.g., data) using one or more application APIs for the applications executing on computing device 112, which user interface generator module 108 may interpret in order to understand the functionality provided by the one or more applications. User interface generator module 108 may further use the retrieved information to contextualize the indication of natural language user input 114 when applying machine learning module 110.
- a natural language user input may include a natural language utterance such as, “Send the money to Mike.” In this example, while Mike is explicitly deemed the recipient of the money, the user has not specified an amount of money to send.
- user interface generator module 108 may retrieve, using API module 106, information associated with predefined functions included in, for example, a messaging application and a banking application.
- User interface generator module 108 may receive, with explicit consent from user 120, data from the applications, such as the content of a message received within the messaging application, and a list of a user’s trusted contacts stored within the banking application.
- User interface generator module 108 may retrieve, for example, data indicative of a message received from Mike R. that includes the phrase, “Can you send me $20?”, and a username associated with Mike R.’s banking application profile. Therefore, computing system 100 may determine that the input command of “Send the money to Mike” indicates a task of sending $20 to Mike R. using the functionality of the banking application.
- computing system 100 may perform tasks using context information and/or user data sourced from one or more applications included in computing device 112. In this way, in some examples, computing system 100 may interpolate natural language input without having to request that users provide additional input for clarification.
- user interface generator module 108 may apply machine learning module 110, which may include a language model configured to perform natural language processing techniques, to the indication of natural language user input 114 to identify one or more tasks.
- a prompt may be provided to machine learning module 110 along with the user input, e.g., a string input such as “Only output in the specified format, no comments or explanations.
- a user has dictated the following to-do items: [Send money to Mike and book Jane’s appointment. Ring doctor to reschedule appointment. Where is the dinner reservation?] The punctuation is not correct, there might be missing periods between items and some items may have been incorrectly combined. Please correct the punctuation and split it into separate items.
- machine learning module 110 may parse through the indication of natural language user input 114 to identify a first task, “Send money to Mike,” and a second task, “Book Jane’s appointment.”
- machine learning module 110 may parse through input including any amount of data, i.e., machine learning module 110 may identify any number of tasks in a single natural language user input 114.
- the output of machine learning module 110 may be in a structured format or a semistructured format.
- machine learning module 110 may further identify, for each task, one or more associated categories, e.g., “Headspaces.”
- Example categories may include, but are not limited to, “Family,” “Banking,” “Food,” “Travel,” “Friends,” “Leisure,” etc.
- the categories may be customized by a user, and/or may be determined from a predefined list of categories.
- a list of the identified tasks may be provided as input to machine learning module 110 along with a prompt, e.g., a prompt such as “Write your response in the following format: [# Headspace name; - Task name; - Task name; - Task name; # Headspace name; - Task name; - Task name; - Task name;]. Your response should begin with ‘ . . . ’. Here are some tasks: [- Send money to Mike; - Book Jane’s appointment]. Please group these tasks into headspaces. Headspaces should contain at least 2 tasks. When naming headspaces, use a vibe, gen-z style, ideally 1 word, no more than 2-3 words.
- machine learning module 110 may determine the first task and the second task to be associated with a “Family” category, as, based on the context information retrieved from computing device 112, machine learning module 110 may determine Mike and Jane to be family members of user 120.
- input provided to machine learning module 110 may include contextual data, or be “injected with memory” that provides context for other input data, such as the identified tasks.
- the determined categories may be sorted based on a respective level of priority.
- an additional input may be provided to machine learning module 110, e.g., a prompt such as “Here are the headspaces and tasks that the user created: [# Adulting; - Ring doctor to reschedule appointment.; - Pay the utility bill.; - Check my savings.; # Squad; - Book Jane’s flu shot.; - Order vests for Jane.; - Ask Jenny if she can look after the kids on Tuesday.; # Travel; - Book a train north.; - Where is the hotel located? # Fixlt; - Ring John the plumber.; - Ask Ian to use sharp sand in the mortar.; - Clean rear cassette.]. Please reorder these headspaces and tasks.
- a prompt such as “Here are the headspaces and tasks that the user created: [# Adulting; - Ring doctor to reschedule appointment.; - Pay the utility bill.; - Check my savings.; # Squad; - Book Jane’s flu shot.; - Order vests for Jane.;
- machine learning module 110 may sort the categories from a level of highest priority to lowest priority as follows: “Adulting,” “Squad,” “Fixlt,” “Travel.” In some examples, a Levenshtein distance algorithm may be used by machine learning module 110 to match the sorted categories and their associated tasks with existing identified tasks.
- computing system 100 may apply machine learning module 110 to the one or more identified tasks to generate a set of instructions, e.g., code.
- machine learning module 110 may generate the set of instructions using a large language model, in which the set of instructions may be generated based on one or more of application functionality, capabilities, and/or attributes included in the information associated with the plurality of functions, contextual information (e.g., user data), and user input received by the computing system.
- a prompt may be generated by machine learning module 110, in which the prompt may specify output format (e.g., javascript code), allowed data types, a UI component library that can be used to build an end result UI, an API library including APIs that can be used to retrieve data at runtime (e.g., predefined APIs or “task APIs” configured to retrieve the information associated with the plurality of functions, APIs for accessing sub-LLMs, sub-prompts for disambiguation steps such as “Which Jenny?”, etc.), user input (e.g., the identified tasks), and context information (e.g., relevant user data) such as “The following things were found in the memory of the device, which may or may not be relevant: [- The user has a calendar event in their diary for a doctor's appointment in 2 days time.
- output format e.g., javascript code
- a UI component library that can be used to build an end result UI
- an API library including APIs that can be used to retrieve data at run
- the prompt may include additional instructions.
- an example prompt may include instructions such as, “The user has made a note of a job they want to do. Your job is to present TaskUIComponent(s) to help them get their task done. You're not in charge of completing the task, you're just presenting UI components that will help them get the task done.
- the to-do item is: ‘Ring doctor to reschedule appointment.’ . Note: if it's not possible to present any UI, rather than displaying a 'text-output' component, it's better to raise an error.
- the set of instructions may be, for example, generated javascript code that returns one or more UI components from the UI component library, in which the UI components may display or use information retrieved by API module 106.
- the UI components from the UI component library may be considered custom user interfaces and widgets that are dynamically generated based on identified tasks, in which the custom user interfaces may be organized into different categories, and the custom widgets may enable users to access functionality for performing the identified tasks.
- the set of instructions may be dynamically generated at runtime based on user input and retrieved information, including data associated with the predefined or statically defined functions, capabilities, or features from the one or more applications. That is, the set of instructions may include dynamically generated or configurable functionality that may adapt or change based on input data and/or other conditions at runtime. In some examples, the set of instructions may include combined functionality, e.g., functions from the one or more applications that are combined with other functions from the one or more applications to provide functionality for performing an identified task. As such, the set of instructions may be considered generated code that provides corresponding graphical user interfaces and application functionality based on user input.
- the set of instructions may be associated with or provide at least one function for performing a respective task from the one or more tasks, e.g., the set of instructions may provide a user’s desired functionality for completing the one or more tasks.
- computing system 100 may generate new code that provides user 120’s desired functionality, so long as the desired functionality is determined to be a possible functionality for the one or more applications (e.g., machine learning module 110 may determine whether the desired functionality is reasonable for the one or more applications).
- computing system 100 may use data retrieved from the messaging application and the banking application to generate new code that provides functionality for sending $20 to Mike by, for example, user 120 interacting with a single graphical component, such as a button.
- the set of instructions may further include instructions for generating at least one graphical user interface associated with the respective category, in which the at least one GUI associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task. That is, continuing the example, computing system 100 may generate instructions for generating a GUI associated with a “Family” category, in which the “Family” GUI may include a widget that enables user 120 to send $20 to John through the click of a button.
- the “category GUIs” described herein may be considered visual spaces each associated with a category from any number of categories (e.g., “Family,” “Work,” “Travel,” etc.), in which each category may be identified by parsing user intent.
- computing system 100 may determine a category from a predetermined list of categories for each identified task.
- computing system 100 may be configured to receive an indication of natural language input 114 based on, for example, a “touch and talk” feature, rather than by the user navigating through the one or more applications. More specifically, in some examples, computing system 100 receives the indication of natural language user input 114 from computing device 112 in response to a gesture detected at a location of a presencesensitive display of computing device 112, e.g., a location that corresponds to a graphical user interface component used for causing computing system 100 to perform the techniques described herein.
- widget 118 may be a widget designated for triggering the techniques attributed to computing system 100, and may be displayed on a home screen (e.g., GUI 116) of computing device 112.
- user 120 may provide natural language input 114 such as, “Send money to mike, book jane’s appointment,” in which holding down on widget 118 may be a gesture that causes a user interface component 102 (e.g., a microphone) of computing device 112 to capture natural language input 114.
- the gesture may be provided mechanically (such as by pressing a button) or by gesture recognition/control (such as triple tapping on a screen).
- the indication of a gesture may be an audible input, whereby the gesture is provided by user 120 via, for example, voice command.
- the indication of the gesture is provided by user 120 by using gesture control, such as by providing the gestures described above (e.g., a hand-wave, a pinch, a clap, a pen stroke, etc.) or by tapping the screen in a certain manner (e.g., triple tapping the screen). Therefore, the techniques described herein may be executed by computing system 100 in response to an indication of a variety of gestures.
- computing system 100 may generate instructions for performing the tasks based on user 120 performing a simple gesture, such as holding down on widget 118 and speaking their intent. In this way, users may not be required to navigate through applications to find their desired functionality or perform various tasks. That is, the techniques described herein may provide user 120 with a mechanism to “shortcut” the complexity of performing various actions for various tasks.
- Computing system 100 may send the set of instructions to computing device 112, in which computing device 112 may use the set of instructions to generate the at least one GUI associated with a respective category.
- computing device 112 may use the set of instructions to generate a “Family” GUI, a “Work” GUI, a “Friends” GUI, etc., in which each GUI further includes at least one graphical component (e.g., a widget) associated with at least one function for performing a respective task.
- the “Family” GUI may include a widget that enables a user to send $20 to Mike by simply clicking a “Send” button included within the widget.
- the set instructions may include instructions for generating the different GUIs in an order based on a level of importance assigned to each category.
- historical data e.g., user data
- computing device 112 may indicate a level of priority for actions and tasks.
- historical data retrieved from computing device 112 may indicate that user 120 frequently sends and receives messages to and from contacts deemed as family members, frequently performs actions within applications that involve said contacts, etc.
- computing system 100 may determine a level of priority for each task associated with a respective category, and may determine, e.g., based on the priority levels of the associated tasks, an overall level of importance for the respective category.
- computing system 100 may determine the “Family” category to have the highest level of importance.
- the set instructions may include instructions for generating e.g., the “Family” GUI as a first GUI in an order of GUIs, a “Work” GUI as second GUI in the order of GUIs, etc.
- GUI 116 may be an example mobile phone home screen, and user 120 may swipe horizontally across GUI 116 to view the “Family” GUI, may swipe horizontally across the “Family” GUI to view the “Work” GUI, and so on.
- each category GUI may be presented as its own screen, and may be presented in an order based on a level of importance, so as to better organize and prioritize the actions for completing a user’s multiple tasks.
- various aspects of the techniques described in this disclosure may facilitate better user experience with applications executing on user devices.
- smaller, more organized, and customizable widgets that provide users access to functionality of one or more larger applications may reduce the amount of time and effort required by a user to access such functionality when trying to complete tasks.
- the techniques described may also provide more assistance to users with disabilities when interacting with devices and applications.
- the techniques described include generating new code based on user intent, users may be able to personalize the functionality of applications with which they interact without requiring a developer of the application to hard-code additional features or otherwise update the application. Additionally, users may find that organizing tasks based on associated categories is helpful for completing tasks in a less convoluted manner.
- FIG. 2 is a block diagram illustrating another example computing system configured to apply a machine learning module to natural language text and audio, in accordance with one or more techniques of this disclosure.
- computing system 200 includes processors 224, one or more communication channels 230, one or more user interface components (UIC) 232, one or more communication units 228, and one or more storage devices 238.
- Storage devices 238 of computing system 200 may include user interface module 204, and user interface generator module 208.
- user interface generator module 208 further includes API module 206, machine learning module 210, speech-to-text module 226, and instructions storage 222.
- computing system 200 may be implemented or performed by a computing device in communication with computing system 200.
- Computing system 200, user interface module 204, user interface generator module 208, API module 206, machine learning module 210, and user interface (UI) components 202 may be similar if not substantially similar to computing system 100, user interface module 104, user interface generator module 108, API module 106, machine learning module 110, and user interface (UI) components 102 of FIG. 1, respectively.
- the one or more communication units 228 of computing system 200 may communicate with external devices by transmitting and/or receiving data at computing system 200, such as to and from remote computer systems or computing devices.
- Example communication units 228 include a network interface card (e.g., such as an Ethernet card), an optical transceiver, a radio frequency transceiver, or any other type of device that can send and/or receive information.
- Other examples of communication units 228 may be devices configured to transmit and receive Ultrawideband®, Bluetooth®, GPS, 3G, 4G, and Wi-Fi®, etc. that may be found in computing devices, such as mobile devices and the like.
- communication channels 230 may interconnect each of the components as shown for inter-component communications (physically, communicatively, and/or operatively).
- communication channels 230 may include a system bus, a network connection (e.g., to a wireless connection), one or more inter-process communication data structures, or any other components for communicating data between hardware and/or software locally or remotely.
- I/O devices 234 of computing system 200 may receive inputs and generate outputs. Examples of inputs are tactile, audio, kinetic, and optical input, to name only a few examples.
- Input devices of I/O devices 234, in one example, may include a touchscreen, a touchpad, a mouse, a keyboard, a voice responsive system, a video camera, buttons, a control pad, a microphone or any other type of device for detecting input from a human or machine.
- Output devices of I/O devices 234 may include, a sound card, a video graphics adapter card, a speaker, a display, or any other type of device for generating output to a human or machine.
- User interface module 204 may perform operations described herein using software, hardware, firmware, or a mixture of hardware, software, and firmware residing in and executing on computing system 200 or at one or more other computing devices (e.g., a cloud- based application - not shown).
- modules 204-226 may be included in and executable on a local computing device, such as computing device 112 of FIG. 1. As such, the techniques described herein may all be implemented locally on a computing device.
- Computing system 200 may execute one or more of modules 204-226, with one or more processors 224 or may execute any or part of one or more of modules 204-226 as or within a virtual machine executing on underlying hardware.
- modules 204-226 may be implemented in various ways, for example, as a downloadable or pre-installed application, remotely as a cloud application, or as part of the operating system of computing system 200.
- Other examples of computing system 200 that implement techniques of this disclosure may include additional components not shown in FIG. 2.
- one or more processors 224 may implement functionality and/or execute instructions within computing system 200.
- one or more processors 224 may receive and execute instructions that provide the functionality of UIC 232, communication units 228, one or more storage devices 238 and an operating system to perform one or more operations as described herein.
- one or more processors 224 may receive and execute instructions that provide the functionality of some or all of modules 204-226 to perform one or more operations and various functions described herein.
- the one or more processors 224 include a central processing unit (CPU).
- CPUs include, but are not limited to, a digital signal processor (DSP), a general-purpose microprocessor, a tensor processing unit (TPU); a neural processing unit (NPU); a neural processing engine; a core of a CPU, VPU, GPU, TPU, NPU or another processing device, an application specific integrated circuit (ASIC), a field programmable logic array (FPGA), or other equivalent integrated or discrete logic circuitry, or other equivalent integrated or discrete logic circuitry.
- DSP digital signal processor
- TPU tensor processing unit
- NPU neural processing unit
- ASIC application specific integrated circuit
- FPGA field programmable logic array
- One or more storage devices 238 within computing system 200 may store information, such as information retrieved from a user computing device, or other data discussed herein, for processing during the operation of computing system 200.
- one or more storage devices of storage devices 238 may be a volatile or temporary memory. Examples of volatile memories include random access memories (RAM), dynamic random-access memories (DRAM), static random-access memories (SRAM), and other forms of volatile memories known in the art.
- Storage devices 238, in some examples, may also include one or more computer-readable storage media. Storage devices 238 may be configured to store larger amounts of information for longer terms in non-volatile memory than volatile memory.
- Non-volatile memories include magnetic hard disks, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.
- Storage devices 238 may store program instructions and/or data associated with the modules 204-226 of FIG. 2.
- computing system 200 may retrieve, using API module 206, information (e.g., API response data) associated with a plurality of functions included in one or more applications executing at a computing device.
- UI module 204 may receive an indication of a natural language user input associated with the plurality of functions.
- the plurality of functions may include some or all of the functions that are predefined, e.g., by application developers, in the one or more applications executing at the computing device.
- computing system 200 may retrieve data, e.g., user data, and/or context information from the one or more applications executing at the computing device, and/or the computing device itself.
- the context information may include, but is not limited to, device location data, device information, network information, connectivity information, application usage data, environmental data, user preference data, battery status, sensor data, application permissions, calendar events, notification data, etc.
- the indication of the natural language user input may be associated with one or more functions from the plurality of functions.
- the natural language user input may include an utterance such as, “Call electrician,” which may be associated with functionality for making a phone call, which may already be predefined in a phone application included in a smartphone.
- the indication of the natural language user input may be received by UI module 204 from the computing device in response to a gesture detected at a location of a presence-sensitive display of the computing device.
- a user may use a “touch and talk” feature on the computing device, in which the indication of the natural language user input is captured by the computing device and sent to UI module 204.
- UI module 204 may further interpret the indication or other inputs detected at the computing device.
- UI module 204 may relay information about the inputs detected at the computing device to one or more associated platforms, operating systems, applications, and/or services executing at the computing device to cause the computing device to perform a function.
- UI module 204 may relay information to the computing device in which the computing device may request the user to repeat or clarify the indication or other inputs.
- UI module 204 may determine whether the indication of a natural language user input is associated with one or more functions from the plurality of functions included in the one or more applications executing at the computing device. In other words, UI module 204 may determine whether the indication and/or other inputs are associated with the capabilities and/or functionality of the applications, such that the user’s desired functionality for completing tasks can be generated.
- UI module 204 may determine that functionality for performing the task of sending a message to Joe M. cannot be generated by computing system 200. UI module 204 may then relay information to the computing device indicating this error, in which the computing device may further relay this error to the user.
- UI module 204 may also receive information and instructions from one or more associated platforms, operating systems, applications, and/or services executing at the computing device (e.g., user interface generator module 208) for generating a file comprising the set of instructions.
- the set of instructions may provide at least one function for performing a respective task from one or more tasks identified in the user input.
- the set of instructions may further include instructions for generating at least one graphical user interface associated with a respective category, in which at least one graphical user interface associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task.
- UI module 204 may act as an intermediary between the one or more associated platforms, operating systems, applications, and/or services executing at the computing device and various output devices of the computing device (e.g., speakers, LED indicators, vibrators, etc.) to produce output (e.g., graphical, audible, tactile, etc.) with the computing device.
- user interface generator module 208 may be implemented on a computing device in various ways.
- user interface generator module 208 may be implemented as a downloadable or pre-installed application or “app.”
- user interface generator module 208 may be implemented as part of an operating system of a computing device.
- Instructions storage 222 may be a storage repository for the information associated with the plurality of functions included in the one or more applications executing at the computing device that are retrieved by API module 206.
- the information associated with the plurality of functions may include API response data, in which the API response data is associated with one or more capabilities and/or functionality of an application that are predefined at compile time.
- Instructions storage 222 may also store, with explicit user consent, context data and/or other data (e.g., user data) retrieved from computing device 112 by API module 106.
- Information may be stored in instructions storage 222 for use by other modules of user interface generator module 208, such as machine learning module 210.
- instructions storage 222 may operate, at least in part, as a cache for instructions retrieved from a computing device (e.g., using one or more communication units 228) or other computing devices.
- instructions storage 222 may be configured as a database, flat file, table, or other data structure stored within storage device 238.
- instructions storage 222 is shared between various modules executing at computing system 200 (e.g., between one or more of modules 204-226 or other modules not shown in FIG. 2).
- a different data repository is configured for a module executing at computing system 200 that requires a data repository. Each data repository may be configured and managed by different modules and may store data in a different manner.
- computing system 200 may receive and store information from a computing device over a specified period of time.
- user interface generator module 208 may receive, from UI module 204, the indication of a natural language user input, which may be an audio or text input from a user operating a computing device.
- a natural language user input which may be an audio or text input from a user operating a computing device.
- speech-to-text module 226 may convert the input into a computer-readable format.
- Speech-to-text module 226 may implement an Automatic Speech Recognition (ASR) system to convert an audio input (e.g., a digital audio signal) into written text.
- ASR Automatic Speech Recognition
- speech-to-text module 226 may preprocess the audio input to enhance quality and remove noise by normalizing the audio volume and filtering out any background noise.
- Speech-to-text module 226 may then transform the audio input into a more suitable format and extract features such as Mel-frequency cepstral coefficients (MFCCs), which capture information about the frequency content of the audio signal over short time intervals.
- MFCCs Mel-frequency cepstral coefficients
- speech-to-text module 226 may perform acoustic modeling (e.g., with Hidden Markov Models (HMMs)), which may involve training a statistical model that maps the extracted audio features to phonemes. The acoustic model may learn to associate specific audio features with phonemes while taking into account the variations in pronunciation, accents, and speaking styles.
- HMMs Hidden Markov Models
- speech-to-text module 226 may further implement language modeling (e.g., deep learning techniques, such as recurrent neural networks (RNNs) and transformers) to capture and predict a sequence of words or phrases while considering the context in which the words are spoken (e.g., speech-to-text module 226 may use context information received by UI module 204). Speech-to-text module 226 may further use the trained acoustic and language models to decode the audio input and generate a transcription or sequence of words that best match the observed audio features. Speech-to-text module 226 may further implement post-processing techniques (e.g., grammar checks, contextual analysis, spell correction, etc.) to refine the transcription and improve readability and accuracy. Speech-to-text module 226 may then output the transcribed text that represents the audio input to machine learning module 210 for further processing and analysis.
- language modeling e.g., deep learning techniques, such as recurrent neural networks (RNNs) and transformers
- RNNs recurrent neural networks
- machine learning module 210 may be configured to interpret both text and audio input received by UI module 204, such as to identify one or more tasks.
- machine learning module 210 may be configured to infer any indication of a natural language user input.
- machine learning module 210 may infer capabilities from user intents.
- machine learning module 210 may search capabilities.
- machine learning module 210 may convert the audio or text input received by UI module 204, the transcribed text output from speech-to-text module 226, and/or any information stored in instructions storage 222 into structured text.
- machine learning module 210 may convert any input or information to an extensible Markup Language (XML), or other structured text types, such as, but not limited to, HTML, JSON, CSV, INI Files, etc.
- XML extensible Markup Language
- other structured text types such as, but not limited to, HTML, JSON, CSV, INI Files, etc.
- machine learning module 210 may determine the type of information to include in the structured text representation. More specifically, machine learning module 210 may analyze various application functionality, capabilities, and attributes included in the information stored in instructions storage 222, such as content descriptions, roles, states, actions, and/or other relevant properties of user interface elements, the contextual information associated with the user input, the audio or text input received by UI module 204, and/or the transcribed text output from speech-to-text module 226.
- the received indication of the natural language user input may be preprocessed.
- the information stored in instructions storage 222 may be preprocessed.
- Preprocessing techniques may include extracting one or more additional features from raw data. For example, feature extraction techniques may be applied to the user input or retrieved instructions to generate one or more new, additional features.
- machine learning module 210 may employ a large language model (LLM) that can interpret the indication of a natural language user input and generate a set of instructions associated with a user’s desired application functionality and corresponding graphical user interface.
- LLM large language model
- machine learning module 210 may implement other machine-learned models that may be used in place of or in conjunction with LLM model that is described with respect to FIGS.
- Machine learning module 210 may perform various types of natural language processing (NLP) based on the indication of the natural language user input.
- NLP natural language processing
- the indication of the natural language user input, retrieved application information, context information, and/or other data (e.g., user data) received by computing system 200 may be referred to herein as “input data”.
- machine learning module 210 may apply one or more machine learning techniques to the input data.
- machine learning module 210 may apply a language model to the indication of the natural language user input to identify one or more tasks, in which each task from the one or more tasks is associated with a respective category from one or more categories.
- machine learning module 210 may apply a machine learning model to the indication of the natural language user input to identify the one or more categories.
- UI module 204 may receive audio input including an utterance such as “Plan the trip with John next month, book Jane’s appointment, order Jersey for Jack.”
- Speech-to-text module 226 may convert the audio input into a text string, which may then be parsed by a machine learning module 210 to identify a first task, “Plan the trip with John next month,” a second task, “book Jane’s appointment,” and a third task, “order Jersey for Jack.”
- computing system 200 may have previously determined a “Family” category, e.g., a “Family” category GUI may have previously been generated on a user’s computing device.
- machine learning module 210 may determine, e.g., based on information stored in instructions storage 222, the second task and the third task to be associated with the “Family” category. However, machine learning module 210 may determine that the first task is not associated with any previously determined category. As such, machine learning module 210 may further determine a new category for the first task, e.g., by applying a machine learning model that can determine a word or phrase associated with the first task, such as “Trip.” Therefore, in this example, machine learning module 210 may determine the first task to be associated with a “Trip” category, and the second and third tasks to be associated with the “Family” category.
- Machine learning module 210 may further apply, using the information associated with the plurality of functions, a machine learning model to the one or more tasks to generate the set of instructions, in which the set of instructions provide at least one function for performing a respective task.
- the set of instructions may provide functionality for performing one or more actions that complete a task.
- multiple actions (which may also be referred to herein as “subtasks”) may be involved, such as sending messages to John, sending funds to John, booking airline tickets, booking accommodations, finding attractions to visit, etc.
- performing a single task and/or subtask may require functionality that has not been predefined by a single application.
- machine learning module 210 may generate code that provides “new” functionality for performing a task and/or subtask.
- machine learning module 210 may use information and existing functionality retrieved from a messaging application, a calendar application, and an airline application to generate new code that provides functionality for purchasing a specific airline ticket based on, for example, messages received from John indicating dates of travel, a user’s schedule for those dates of travel, and historical user data indicating the user’s preferred airline ticket class.
- the set of instructions may further include instructions for generating at least one GUI associated with a respective category, in which the at least one GUI may include at least one graphical component associated with the at least one function for performing the respective task.
- the instructions may include instructions for generating a “Trip” GUI, in which the “Trip” GUI may include a widget for each identified subtask.
- the “Trip” GUI may include a “Book Flight” widget that further includes a “Book This Flight” button for purchasing the specific airline ticket. That is, the “Book This Flight” button may be associated with the new functionality for purchasing the specific airline ticket, and may act as a “shortcut” for the user in performing the task of purchasing the specific airline ticket.
- FIG. 3 A is a conceptual diagram illustrating an example training process for a machine learning module, in accordance with one or more techniques of this disclosure.
- computing device 112 of FIG. 1 may store and implement machine learning module 310 locally (i.e., on-device).
- machine learning module 310 can be stored at and/or implemented locally by an embedded device or a user computing device such as a mobile device.
- Output data obtained through local implementation of machine learning module 310 at the embedded device or the user computing device can be used to improve performance of the embedded device or the user computing device (e.g., an application implemented by the embedded device or the user computing device).
- Machine learning module 310 described herein can be trained at a training computing system, and then provided for storage and/or implementation at one or more computing devices, such as computing device 112 of FIG. 1.
- training process 340 executes locally at computing system 100 of FIG. 1.
- training process 340 can be included in or separate from any computing system that implements machine learning module 310.
- machine learning module 310 may be or include one or more inference models, i.e., one or more trained machine learning models that can be used to make predictions based on new, unseen data.
- Machine learning module 310 may “infer” conclusions or outputs, which may be predictions, classifications, recommendations, or other types of decision-making.
- Machine learning module 310 may be trained according to one or more of various different training types or techniques. For example, in some examples, machine learning module 310 may be trained by training process 340 of FIG. 3 A.
- machine learning module 310 may be trained on training data 331 that may include input data 333 that has labels 337.
- the training process shown in FIG. 3A is one example training process; other training processes may be used as well.
- machine learning module 310 may learn patterns from training data 331, and training process 340 may optimize parameters for machine learning module 310 to minimize prediction errors.
- Training data 331 can include, upon user permission for use of such data for training, anonymized usage logs of sharing flows, e.g., content items that were shared together, bundled content pieces already identified as belonging together, e.g., from entities in a knowledge graph, etc.
- training data 331 can include examples of input data 333 that have been assigned labels 337 that correspond to output data 335.
- machine learning module 310 can be trained by optimizing an objective function, such as objective function 339.
- objective function 339 may be or include a loss function that compares (e.g., determines a difference between) output data generated by the model from the training data and labels (e.g., groundtruth labels) associated with the training data.
- the loss function can evaluate a sum or mean of squared differences between output data 335 and the labels.
- objective function 339 may be or include a cost function that describes a cost of a certain outcome or output data.
- Other examples of objective function 339 can include marginbased techniques such as, for example, triplet loss or maximum-margin training.
- optimization techniques can be performed to optimize objective function 339.
- the optimization technique(s) can minimize or maximize objective function 339.
- Example optimization techniques include Hessian-based techniques and gradient-based techniques, such as, for example, coordinate descent; gradient descent (e.g., stochastic gradient descent); subgradient methods; etc.
- Other optimization techniques include black box optimization techniques and heuristics.
- backward propagation of errors can be used in conjunction with an optimization technique (e.g., gradient based techniques) to train machine learning module 310 (e.g., when a machine-learned model is a multi-layer model such as an artificial neural network).
- an iterative cycle of propagation and model parameter (e.g., weights) update can be performed to train machine learning module 310.
- Example backpropagation techniques include truncated backpropagation through time, Levenberg- Marquardt backpropagation, etc.
- machine learning module 310 described herein can be trained using unsupervised learning techniques.
- Unsupervised learning can include inferring a function to describe hidden structure from unlabeled data. For example, a classification or categorization may not be included in the data.
- Unsupervised learning techniques can be used to produce machine-learned models capable of performing clustering, anomaly detection, learning latent variable models, or other tasks.
- Machine learning module 310 can be trained using semi-supervised techniques which combine aspects of supervised learning and unsupervised learning.
- Machine learning module 310 can be trained or otherwise generated through evolutionary techniques or genetic algorithms.
- machine learning module 310 described herein can be trained using reinforcement learning.
- an agent e.g., model
- Reinforcement learning can differ from the supervised learning problem in that correct input/output pairs are not presented, nor sub-optimal actions explicitly corrected.
- one or more generalization techniques can be performed during training to improve the generalization of machine learning module 310.
- Generalization techniques can help reduce overfitting of machine learning module 310 to the training data.
- Example generalization techniques include dropout techniques; weight decay techniques; batch normalization; early stopping; subset selection; stepwise selection; etc.
- machine learning module 310 described herein can include or otherwise be impacted by a number of hyperparameters, such as, for example, learning rate, number of layers, number of nodes in each layer, number of leaves in a tree, number of clusters; etc.
- Hyperparameters can affect model performance. Hyperparameters can be hand selected or can be automatically selected through application of techniques such as, for example, grid search; black box optimization techniques (e.g., Bayesian optimization, random search, etc.); gradient-based optimization; etc.
- Example techniques and/or tools for performing automatic hyperparameter optimization include Hyperopt; Auto-WEKA; Spearmint; Metric Optimization Engine (MOE); etc.
- various techniques can be used to optimize and/or adapt the learning rate when the model is trained.
- Example techniques and/or tools for performing learning rate optimization or adaptation include Adagrad; Adaptive Moment Estimation (ADAM); Adadelta; RMSprop; etc.
- transfer learning techniques can be used to provide an initial model from which to begin training of machine learning module 310 described herein.
- transfer learning involves reusing a model and its model parameters obtained while solving one problem and applying it to a different but related problem. Models trained on very large data sets may be retrained or fine-tuned on additional data. Often, all model designs and their parameters on a source model are copied except output layer(s). The output layers(s) are often called the head, and other layers are often called the base.
- the source parameters may be considered to contain the knowledge learned from the source dataset and this knowledge may also be applicable to a target dataset. Fine-tuning may include updating the head parameters with the body parameters being fixed or updated in a later step.
- machine learning module 310 may be trained in an offline fashion or an online fashion.
- offline training also known as batch learning
- machine learning module 310 is trained on the entirety of a static set of training data.
- machine learning module 310 is continuously trained (or re-trained) as new training data becomes available (e.g., while the model is used to perform inference).
- training process 340 may involve centralized training of machine learning module 310 (e.g., based on a centrally stored dataset).
- decentralized training techniques such as distributed training, federated learning, or the like can be used to train, update, or personalize machine learning module 310.
- Machine learning module 310 described herein can be trained according to one or more of various different training types or techniques.
- machine learning module 310 can be trained by training process 340 using supervised learning, in which machine learning module 310 is trained on a training dataset that includes instances or examples that have labels.
- the labels can be manually applied by experts, generated through crowd-sourcing, or provided by other techniques (e.g., by physics-based or complex mathematical models).
- the training examples can be provided by the user computing device. In some examples, this process can be referred to as personalizing the model.
- machine learning module 310 includes a language model that may be trained (e.g., pre-trained, fine-tuned, etc.) by training process 340.
- training process 340 may pre-train a language model on a large and diverse corpus of text.
- training data 331 may include a dataset that covers a wide range of topics and domains to ensure machine learning module 310 learns diverse linguistic patterns and contextual relationships.
- Training process 340 may train a language model to optimize objective function 339.
- Objective function 339 may be or include a loss function, such as cross-entropy loss, that compares (e.g., determines a difference between) output data generated by the model from training data 331 and labels 337 (e.g., ground-truth labels) associated with training data 331.
- a loss function such as cross-entropy loss
- objective function 339 for a language model may be to correctly predict the next word in a sequence of words or correctly fill in missing words as much as possible.
- training process 340 may use techniques such low-rank adaptation (LoRA) to train or fine-tune language models (LLMs) implemented by machine learning module 310.
- LoRA may reduce the number of trainable parameters by freezing pre-trained weights of an LLM and injecting small, trainable low-rank matrices that adapt the model for specific tasks.
- LoRa may be useful when a model needs to be adapted to multiple tasks with limited task-specific data. That is, training process 340 may use LoRA for taskspecific fine-tuning.
- training process 340 may use techniques such as retrieval-augmented generation (RAG), which is a hybrid framework that combines information retrieval with text generation.
- RAG retrieval-augmented generation
- RAG may be used to fine-tune a generative model implemented by machine learning module 310 by retrieving relevant information from an external database or dataset (e.g., a large and diverse corpus of text) and using that information to generate output that is more accurate and informative. RAG may be useful for generating more factually accurate and contextually relevant summaries and responses to questions.
- an external database or dataset e.g., a large and diverse corpus of text
- training process 340 may continuously or periodically train a language model included in machine learning module 310.
- training process 340 may fine-tune a language model by using feedback in the training process.
- UI component 202 of FIG. 2 may receive a user input via a computing device that selects feedback (e.g., thumbs up, thumbs down, etc.) relating to the generated application functionality and associated GUIs that are presented to the user on the computing device.
- the feedback may indicate whether the generated application functionality and associated GUIs are accurate or inaccurate, correct or incorrect, high quality or low quality, etc.
- UI module 204 may receive this feedback and may send it to user interface generator module 208.
- User interface generator module 208 may transmit the feedback to machine learning module 310 (specifically to training process 340), in which training process 340 uses the feedback for training.
- training process 340 may convert the feedback into labeled data for supervised training.
- training process 340 may fine-tune a language model by monitoring the relationship between the performance of the language model and user feedback, and iterate the fine-tuning process as necessary (e.g., to receive more positive user feedback and less negative user feedback).
- the techniques of this disclosure may establish a feedback loop that continuously improves the quality of output data 335 (e.g., an instructions file) of a language model.
- FIG. 3B is a conceptual diagram illustrating an example trained machine learning module, in accordance with one or more techniques of this disclosure.
- computing device 112 of FIG. 1 may store and implement machine learning module 310 locally (i.e., on-device).
- machine learning module 310 can be stored at and/or implemented locally by an embedded device or a user computing device such as a mobile device.
- Output data obtained through local implementation of machine learning module 310 at the embedded device or the user computing device can be used to improve performance of the embedded device or the user computing device (e.g., an application implemented by the embedded device or the user computing device).
- Machine learning module 310 of FIG. 3B may be trained at a computing system, such as computing system 100 of FIG. 1, and then provided for storage and/or implementation at one or more computing devices, such as computing device 112 of FIG. 1.
- machine learning module 310 executes locally at computing system 100 of FIG. 1.
- computing system 100 may perform machine learning as a service.
- machine learning module 310 is trained (e.g., via training process 340 of FIG. 3A) to receive input data 333, which may be of one or more types and, in response, provide output data 335, which may be of one or more types.
- FIG. 3B illustrates machine learning module 310 performing inference, in which machine learning module 310 may use learned patterns to make predictions or decisions on new data, e.g., input data 333.
- Machine learning module 310 may include one or more machine-learned models trained by training process 340 of FIG. 3 A.
- Input data 333 may include one or more features that are associated with an instance or an example.
- the one or more features associated with the instance or example can be organized into a feature vector.
- output data 335 can include one or more predictions. Predictions can also be referred to as inferences.
- machine learning module 310 can output a prediction for such instance based on the features.
- Machine learning module 310 can be or include one or more of various different types of machine-learned models.
- machine learning module 310 may perform NLP tasks.
- Machine learning module 310 may summarize, translate, or organize input data 333.
- Machine learning module 310 may use recurrent neural networks (RNNs) and/or transformer models (self-attention models).
- Example models may include, but are not limited to, GPT-3, BERT, Gemini (e.g., Gemini Ultra, Gemini Pro, Gemini Flash, Gemini Nano), Android AlCore, and T5.
- machine learning module 310 may perform classification, summarization, name generation, regression, clustering, anomaly detection, recommendation generation, and/or other tasks.
- machine learning module 310 can perform various types of classification based on input data 333.
- machine learning module 310 can perform binary classification or multiclass classification.
- binary classification output data 335 can include a classification of input data 333 into one of two different classes.
- multiclass classification output data 335 can include a classification of input data 333 into one (or more) of more than two classes.
- the classifications can be single label or multi-label.
- Machine learning module 310 may perform discrete categorical classification in which input data 333 is simply classified into one or more classes or categories.
- machine learning module 310 can perform classification in which machine learning module 310 provides, for each of one or more classes, a numerical value descriptive of a degree to which it is believed that input data 333 should be classified into the corresponding class.
- the numerical values provided by machine learning module 310 can be referred to as “confidence scores” that are indicative of a respective confidence associated with classification of the input into the respective class.
- the confidence scores can be compared to one or more thresholds to render a discrete categorical prediction. In some examples, only a certain number of classes (e.g., one) with the relatively largest confidence scores can be selected to render a discrete categorical prediction.
- Machine learning module 310 may output a probabilistic classification. For example, machine learning module 310 may predict, given a sample input, a probability distribution over a set of classes. Thus, rather than outputting only the most likely class to which the sample input should belong, machine learning module 310 can output, for each class, a probability that the sample input belongs to such class. In some examples, the probability distribution over all possible classes can sum to one. In some examples, a Softmax function, or other type of function or layer can be used to squash a set of real values respectively associated with the possible classes to a set of real values in the range (0, 1) that sum to one.
- the probabilities provided by the probability distribution can be compared to one or more thresholds to render a discrete categorical prediction. In some examples, only a certain number of classes (e.g., one) with the relatively largest predicted probability can be selected to render a discrete categorical prediction.
- machine learning module 310 may be trained using supervised learning techniques. For example, machine learning module 310 may be trained on a training dataset that includes training examples labeled as belonging (or not belonging) to one or more classes.
- machine learning module 310 can perform regression to provide output data in the form of a continuous numeric value.
- the continuous numeric value can correspond to any number of different metrics or numeric representations, including, for example, currency values, scores, or other numeric representations.
- machine learning module 310 can perform linear regression, polynomial regression, or nonlinear regression.
- machine learning module 310 can perform simple regression or multiple regression.
- a Softmax function or other function or layer can be used to squash a set of real values respectively associated with two or more possible classes to a set of real values in the range (0, 1) that sum to one.
- Machine learning module 310 may perform various types of clustering.
- machine learning module 310 can identify one or more previously-defined clusters to which input data 333 most likely corresponds.
- Machine learning module 310 may identify one or more clusters within input data 333. That is, in instances in which input data 333 includes multiple objects, documents, or other entities, machine learning module 310 can sort the multiple entities included in input data 333 into a number of clusters. In some examples in which machine learning module 310 performs clustering, machine learning module 310 can be trained using unsupervised learning techniques.
- Machine learning module 310 may perform anomaly detection or outlier detection. For example, machine learning module 310 can identify input data that does not conform to an expected pattern or other characteristic (e.g., as previously observed from previous input data). As examples, the anomaly detection can be used for fraud detection or system failure detection.
- machine learning module 310 can provide output data in the form of one or more recommendations.
- machine learning module 310 can be included in a recommendation system or engine.
- machine learning module 310 can output a suggestion or recommendation of one or more additional entities that, based on the previous outcomes, are expected to have a desired outcome (e.g., elicit a score, ranking, or rating indicative of success or enjoyment).
- a recommendation system can output a suggestion or recommendation of an application that the user might enjoy or wish to download to computing device 112.
- Machine learning module 310 may, in some cases, act as an agent within an environment.
- machine learning module 310 can be trained using reinforcement learning, which will be discussed in further detail below.
- machine learning module 310 can be a parametric model while, in other implementations, machine learning module 310 can be a non-parametric model. In some examples, machine learning module 310 can be a linear model while, in other implementations, machine learning module 310 can be a non-linear model.
- machine learning module 310 can be or include one or more of various different types of machine-learned models. Examples of such different types of machine-learned models are provided below for illustration. One or more of the example models described below can be used (e.g., combined) to provide output data 335 in response to input data 333. Additional models beyond the example models provided below can be used as well.
- machine learning module 310 can be or include one or more classifier models such as, for example, linear classification models; quadratic classification models; etc.
- Machine learning module 310 may be or include one or more regression models such as, for example, simple linear regression models; multiple linear regression models; logistic regression models; stepwise regression models; multivariate adaptive regression splines; locally estimated scatterplot smoothing models; etc.
- machine learning module 310 can be or include one or more decision tree-based models such as, for example, classification and/or regression trees; iterative dichotomiser 3 decision trees; C4.5 decision trees; chi-squared automatic interaction detection decision trees; decision stumps; conditional decision trees; etc.
- decision tree-based models such as, for example, classification and/or regression trees; iterative dichotomiser 3 decision trees; C4.5 decision trees; chi-squared automatic interaction detection decision trees; decision stumps; conditional decision trees; etc.
- Machine learning module 310 may be or include one or more kernel machines. In some examples, machine learning module 310 can be or include one or more support vector machines. Machine learning module 310 may be or include one or more instance-based learning models such as, for example, learning vector quantization models; self- organizing map models; locally weighted learning models; etc. In some examples, machine learning module 310 can be or include one or more nearest neighbor models such as, for example, k- nearest neighbor classifications models; k- nearest neighbors regression models; etc.
- Machine learning module 310 can be or include one or more Bayesian models such as, for example, naive Bayes models; Gaussian naive Bayes models; multinomial naive Bayes models; averaged one-dependence estimators; Bayesian networks; Bayesian belief networks; hidden Markov models; etc.
- Bayesian models such as, for example, naive Bayes models; Gaussian naive Bayes models; multinomial naive Bayes models; averaged one-dependence estimators; Bayesian networks; Bayesian belief networks; hidden Markov models; etc.
- machine learning module 310 can be or include one or more artificial neural networks (also referred to simply as neural networks).
- a neural network can include a group of connected nodes, which also can be referred to as neurons or perceptrons.
- a neural network can be organized into one or more layers. Neural networks that include multiple layers can be referred to as “deep” networks. A deep network can include an input layer, an output layer, and one or more hidden layers positioned between the input layer and the output layer. The nodes of the neural network can be connected or non-fully connected.
- Machine learning module 310 can be or include one or more feed forward neural networks. In feed forward networks, the connections between nodes do not form a cycle. For example, each connection can connect a node from an earlier layer to a node from a later layer.
- machine learning module 310 can be or include one or more recurrent neural networks.
- at least some of the nodes of a recurrent neural network can form a cycle.
- Recurrent neural networks can be especially useful for processing input data that is sequential in nature.
- a recurrent neural network can pass or retain information from a previous portion of input data 333 sequence to a subsequent portion of input data 333 sequence through the use of recurrent or directed cyclical node connections.
- sequential input data can include time-series data (e.g., sensor data versus time or imagery captured at different times).
- a recurrent neural network can analyze sensor data versus time to detect or predict a swipe direction, to perform handwriting recognition, etc.
- Sequential input data may include words in a sentence (e.g., for natural language processing, speech detection or processing, etc.); notes in a musical composition; sequential actions taken by a user (e.g., to detect or predict sequential application usage); sequential object states; etc.
- Example recurrent neural networks include long short-term (LSTM) recurrent neural networks; gated recurrent units; bi-direction recurrent neural networks; continuous time recurrent neural networks; neural history compressors; echo state networks; Elman networks; Jordan networks; recursive neural networks; Hopfield networks; fully recurrent networks; sequence-to- sequence configurations; etc.
- LSTM long short-term
- machine learning module 310 can be or include one or more convolutional neural networks.
- a convolutional neural network can include one or more convolutional layers that perform convolutions over input data using learned filters.
- Filters can also be referred to as kernels.
- Convolutional neural networks can be especially useful for vision problems such as when input data 333 includes imagery such as still images or video. However, convolutional neural networks can also be applied for natural language processing.
- machine learning module 310 can be or include one or more generative networks such as, for example, generative adversarial networks.
- Generative networks can be used to generate new data such as new images or other content.
- Machine learning module 310 may be or include an autoencoder.
- the aim of an autoencoder is to learn a representation (e.g., a lower- dimensional encoding) for a set of data, typically for the purpose of dimensionality reduction.
- an autoencoder can seek to encode input data 333 and then provide output data that reconstructs input data 333 from the encoding.
- the autoencoder concept has become more widely used for learning generative models of data.
- the autoencoder can include additional losses beyond reconstructing input data 333.
- Machine learning module 310 may be or include one or more other forms of artificial neural networks such as, for example, deep Boltzmann machines; deep belief networks; stacked autoencoders; etc. Any of the neural networks described herein can be combined (e.g., stacked) to form more complex networks.
- One or more neural networks can be used to provide an embedding based on input data 333.
- the embedding can be a representation of knowledge abstracted from input data 333 into one or more learned dimensions.
- embeddings can be a useful source for identifying related entities.
- embeddings can be extracted from the output of the network, while in other instances embeddings can be extracted from any hidden node or layer of the network (e.g., a close to final but not final layer of the network).
- Embeddings can be useful for performing auto suggest next video, product suggestion, entity or object recognition, etc.
- embeddings can be useful inputs for downstream models. For example, embeddings can be useful to generalize input data (e.g., search queries) for a downstream model or processing system.
- Machine learning module 310 may include one or more clustering models such as, for example, k-means clustering models; k-medians clustering models; expectation maximization models; hierarchical clustering models; etc.
- clustering models such as, for example, k-means clustering models; k-medians clustering models; expectation maximization models; hierarchical clustering models; etc.
- machine learning module 310 can perform one or more dimensionality reduction techniques such as, for example, principal component analysis; kernel principal component analysis; graph-based kernel principal component analysis; principal component regression; partial least squares regression; Sammon mapping; multidimensional scaling; projection pursuit; linear discriminant analysis; mixture discriminant analysis; quadratic discriminant analysis; generalized discriminant analysis; flexible discriminant analysis; autoencoding; etc.
- principal component analysis kernel principal component analysis
- graph-based kernel principal component analysis principal component regression
- partial least squares regression Sammon mapping
- multidimensional scaling projection pursuit
- linear discriminant analysis mixture discriminant analysis
- quadratic discriminant analysis generalized discriminant analysis
- flexible discriminant analysis flexible discriminant analysis
- autoencoding etc.
- machine learning module 310 can perform or be subjected to one or more reinforcement learning techniques such as Markov decision processes; dynamic programming; Q functions or Q-learning; value function approaches; deep Q-networks; differentiable neural computers; asynchronous advantage actor-critics; deterministic policy gradient; etc.
- machine learning module 310 can be an autoregressive model.
- an autoregressive model can specify that output data 335 depends linearly on its own previous values and on a stochastic term.
- an autoregressive model can take the form of a stochastic difference equation.
- WaveNet is a generative model for raw audio.
- machine learning module 310 can include or form part of a multiple model ensemble.
- bootstrap aggregating can be performed, which can also be referred to as “bagging.”
- a training dataset is split into a number of subsets (e.g., through random sampling with replacement) and a plurality of models are respectively trained on the number of subsets.
- respective outputs of the plurality of models can be combined (e.g., through averaging, voting, or other techniques) and used as the output of the ensemble.
- Random forests are an ensemble learning method for classification, regression, and other tasks. Random forests are generated by producing a plurality of decision trees at training time. In some instances, at inference time, the class that is the mode of the classes (classification) or the mean prediction (regression) of the individual trees can be used as the output of the forest. Random decision forests can correct for decision trees' tendency to overfit their training set.
- Another example ensemble technique is stacking, which can, in some instances, be referred to as stacked generalization.
- Stacking includes training a combiner model to blend or otherwise combine the predictions of several other machine-learned models.
- a plurality of machine-learned models e.g., of same or different type
- a combiner model can be trained to take the predictions from the other machine-learned models as inputs and, in response, produce a final inference or prediction.
- a single-layer logistic regression model can be used as the combiner model.
- Another example of an ensemble technique is boosting. Boosting can include incrementally building an ensemble by iteratively training weak models and then adding to a final strong model. For example, in some instances, each new model can be trained to emphasize the training examples that previous models misinterpreted (e.g., misclassified).
- a weight associated with each of such misinterpreted examples can be increased.
- AdaBoost AdaBoost
- Other example boosting techniques include LPBoost; TotalBoost; BrownBoost; xgboost; MadaBoost, LogitBoost, gradient boosting; etc.
- any of the models described above e.g., regression models and artificial neural networks
- an ensemble can include a top level machine- learned model or a heuristic function to combine and/or weight the outputs of the models that form the ensemble.
- multiple machine-learned models e.g., that form an ensemble can be linked and trained jointly (e.g., through backpropagation of errors sequentially through the model ensemble).
- only a subset (e.g., one) of the jointly trained models is used for inference.
- machine learning module 310 can be used to preprocess input data 333 for subsequent input into another model.
- machine learning module 310 can perform dimensionality reduction techniques and embeddings (e.g., matrix factorization, principal components analysis, singular value decomposition, word2vec/GLOVE, and/or related approaches); clustering; and even classification and regression for downstream consumption.
- Input data 333 can include different types, forms, or variations of input data.
- input data 333 can include features that describe the content (or portion of content) initially selected by the user, e.g., content of user-selected document or image, links pointing to the user selection, links within the user selection relating to other files available on device or cloud, metadata of user selection, etc. Additionally, with user permission, input data 333 includes the context of user usage, either obtained from the app itself or from other sources.
- usage context examples include breadth of share (sharing publicly, or with a large group, or privately, or a specific person), context of share, etc.
- additional input data can include the state of the device, e.g., the location of the device, the apps running on the device, etc.
- machine learning module 310 can receive and use input data 333 in its raw form.
- the raw input data can be preprocessed.
- machine learning module 310 can receive and use the preprocessed input data.
- preprocessing input data 333 can include extracting one or more additional features from the raw input data.
- feature extraction techniques can be applied to input data 333 to generate one or more new, additional features.
- Example feature extraction techniques include edge detection; corner detection; blob detection; ridge detection; scale-invariant feature transform; motion detection; optical flow; Hough transform; etc.
- the extracted features can include or be derived from transformations of input data 333 into other domains and/or dimensions.
- the extracted features can include or be derived from transformations of input data 333 into the frequency domain.
- wavelet transformations and/or fast Fourier transforms can be performed on input data 333 to generate additional features.
- the extracted features can include statistics calculated from input data 333 or certain portions or dimensions of input data 333.
- Example statistics include the mode, mean, maximum, minimum, or other metrics of input data 333 or portions thereof.
- input data 333 can be sequential in nature.
- the sequential input data can be generated by sampling or otherwise segmenting a stream of input data.
- frames can be extracted from a video.
- sequential data can be made non-sequential through summarization.
- portions of input data 333 can be imputed.
- additional synthetic input data can be generated through interpolation and/or extrapolation.
- some or all of input data 333 can be scaled, standardized, normalized, generalized, and/or regularized.
- Example regularization techniques include ridge regression; least absolute shrinkage and selection operator (LASSO); elastic net; least-angle regression; cross-validation; LI regularization; L2 regularization; etc.
- some or all of input data 333 can be normalized by subtracting the mean across a given dimension’s feature values from each individual feature value and then dividing by the standard deviation or other metric.
- some or all or input data 333 can be quantized or discretized.
- qualitative features or variables included in input data 333 can be converted to quantitative features or variables. For example, one hot encoding can be performed.
- dimensionality reduction techniques can be applied to input data 333 prior to input into machine learning module 310.
- dimensionality reduction techniques including, for example, principal component analysis; kernel principal component analysis; graph-based kernel principal component analysis; principal component regression; partial least squares regression; Sammon mapping; multidimensional scaling; projection pursuit; linear discriminant analysis; mixture discriminant analysis; quadratic discriminant analysis; generalized discriminant analysis; flexible discriminant analysis; autoencoding; etc.
- input data 333 can be intentionally deformed in any number of ways to increase model robustness, generalization, or other qualities.
- Example techniques to deform input data 333 include adding noise; changing color, shade, or hue; magnification; segmentation; amplification; etc.
- output data 335 can include different types, forms, or variations of output data.
- output data 335 can include content, either stored locally on the user device or in the cloud, that is relevantly shareable along with the initial content selection.
- output data 335 can include various types of classification data (e.g., binary classification, multiclass classification, single label, multilabel, discrete classification, regressive classification, probabilistic classification, etc.) or can include various types of regressive data (e.g., linear regression, polynomial regression, nonlinear regression, simple regression, multiple regression, etc.). In other instances, output data 335 can include clustering data, anomaly detection data, recommendation data, or any of the other forms of output data discussed above.
- classification data e.g., binary classification, multiclass classification, single label, multilabel, discrete classification, regressive classification, probabilistic classification, etc.
- regressive data e.g., linear regression, polynomial regression, nonlinear regression, simple regression, multiple regression, etc.
- output data 335 can include clustering data, anomaly detection data, recommendation data, or any of the other forms of output data discussed above.
- output data 335 can influence downstream processes or decision making. As one example, in some examples, output data 335 can be interpreted and/or acted upon by a rules-based regulator.
- Example computing devices include user computing devices (e.g., laptops, desktops, and mobile computing devices such as tablets, smartphones, wearable computing devices, etc.); embedded computing devices (e.g., devices embedded within a vehicle, camera, image sensor, industrial machine, satellite, gaming console or controller, or home appliance such as a refrigerator, thermostat, energy meter, home energy manager, smart home assistant, etc.); server computing devices (e.g., database servers, parameter servers, file servers, mail servers, print servers, web servers, game servers, application servers, etc.); dedicated, specialized model processing or training devices; virtual computing devices; other computing devices or computing infrastructure; or combinations thereof.
- a computing system that implements machine learning module 310 or other aspects of the present disclosure may include a number of hardware components that enable the performance of the techniques described herein.
- output data 335 obtained through machine learning module 310 at a computing system or device can be used to improve other device tasks or can be used by other non-user devices to improve services performed by or for such other non-user devices.
- output data 335 can improve other downstream processes performed by a server device for a computing device of a user or embedded computing device.
- output data 335 obtained through implementation of machine learning module 310 at a computing system or device can be sent to and used by a user computing device, an embedded computing device, or some other client device.
- computing system 200 of FIG. 2 may perform machine learning as a service.
- different respective portions of machine learning module 310 can be stored at and/or implemented by some combination of a user computing device; an embedded computing device; a server computing device; etc.
- portions of machine learning module 310 may be distributed in whole or in part amongst a client device (e.g., computing device 112 of FIG. 1) and a computing system (e.g., computing system 100 of FIG. 1).
- a computing device such as computing device 112 of FIG. 1 may perform graph processing techniques or other machine learning techniques using one or more machine learning platforms, frameworks, and/or libraries, such as, for example, TensorFlow, Caffe/Caffe2, Theano, Torch/Py Torch, MXnet, CNTK, etc.
- multiple instances of machine learning module 310 can be parallelized to provide increased processing throughput.
- the multiple instances of machine learning module 310 can be parallelized on a single processing device or computing device or parallelized across multiple processing devices or computing devices.
- a computing device that implements machine learning module 310 or other aspects of the present disclosure can include a number of hardware components that enable performance of the techniques described herein.
- a computing device can include one or more memory devices that store some or all of machine learning module 310.
- machine learning module 310 can be a structured numerical representation that is stored in memory.
- the one or more memory devices can also include instructions for implementing machine learning module 310 or performing other operations.
- Example memory devices include RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
- a computing device can also include one or more processing devices that implement some or all of machine learning module 310 and/or perform other related operations.
- Example processing devices include one or more of: a central processing unit (CPU); a visual processing unit (VPU); a graphics processing unit (GPU); a tensor processing unit (TPU); a neural processing unit (NPU); a neural processing engine; a core of a CPU, VPU, GPU, TPU, NPU or other processing device; an application specific integrated circuit (ASIC); a field programmable gate array (FPGA); a co-processor; a controller; or combinations of the processing devices described above.
- Processing devices can be embedded within other hardware components such as, for example, an image sensor, accelerometer, etc.
- Hardware components e.g., memory devices and/or processing devices
- machine learning module 310 described herein can be included in different portions of computer-readable code on a computing device.
- machine learning module 310 can be included in a particular application or program and used (e.g., exclusively) by such a particular application or program.
- a computing device can include a number of applications and one or more of such applications can contain its own respective machine learning library and machine-learned model(s).
- machine learning module 310 described herein can be included in an operating system of a computing device (e.g., in a central intelligence layer of an operating system) and can be called or otherwise used by one or more applications that interact with the operating system.
- each application can communicate with the central intelligence layer (and model(s) stored therein) using an application programming interface (API) (e.g., a common, public API across all applications).
- API application programming interface
- the central intelligence layer can communicate with a central device data layer.
- the central device data layer can be a centralized repository of data for the computing device.
- the central device data layer can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and/or additional components.
- the central device data layer can communicate with each device component using an API (e.g., a private API).
- API e.g., a private API
- Databases and applications can be implemented on a single system or distributed across multiple systems. Distributed components can operate sequentially or in parallel.
- machine learning techniques described herein are readily interchangeable and combinable. Although certain example techniques have been described, many others exist and can be used in conjunction with aspects of the present disclosure.
- a user may be provided with controls that enable the user to make an election as to both if and when systems, programs or features described herein may enable collection of user information (e.g., information about a user’s social network, social actions or activities, profession, a user’s preferences, or a user’s current location), and if the user is sent content or communications from a server.
- user information e.g., information about a user’s social network, social actions or activities, profession, a user’s preferences, or a user’s current location
- certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information is removed.
- a user’s identity may be treated so that no personally identifiable information can be determined for the user, or a user’s geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined.
- location information such as to a city, ZIP code, or state level
- FIG. 3C is a conceptual diagram illustrating a machine learning module configured to apply a large language model that accepts natural language input and provides code for corresponding graphical user interfaces and application functionality as output, in accordance with one or more techniques of this disclosure.
- Machine learning module 310 of FIG. 3C may be an example of machine learning module 310 of FIGS. 3 A and 3B.
- ML module 310 can be or include one or more transformer-based neural networks, such as a large language model module 342.
- Language model module 342 may implement, for example, the Pathways Language Model developed by Google.
- Transformer-based neural networks may refer to a type of deep learning architecture specifically designed for handling sequential data, such as text or time series.
- transformer-based neural networks like LLMs may be configured to perform natural language processing (NLP) tasks, such as questionanswering, machine translation, text summarization, and sentiment analysis.
- Language model module 342 may be configured to perform tasks such as classification, sentiment analysis, entity extraction, extractive question answering, summarization, re-writing text in a different style, ad copy generation, and concept ideation.
- Transformer-based neural networks may utilize a self-attention mechanism, which allows the model to weigh the importance of different elements in a given input sequence relative to each other.
- the self-attention mechanism may help language model module 342 effectively capture long-range dependencies and complex relationships between elements, such as words in a sentence.
- Language model module 342 may include an encoder and a decoder that operate to process and generate sequential data, such as structured text. Both the encoder and decoder may include one or more of self-attention mechanisms, position-wise feedforward networks, layer normalization, or residual connections.
- the encoder may process an input sequence and create a representation that captures the relationships and context among the elements in the sequence. The decoder may then obtain the representation generated by the encoder and produce an output sequence.
- the decoder may generate the output one element at a time (e.g., one word at a time), using a process called autoregressive decoding, where the previously generated elements are used as input to predict the next element in the sequence.
- instructions file 350 which includes the set of instructions, may include instructions for prompting the user to clarify their input.
- the program may be paused for these disambiguation steps.
- the prompt may include a list of options, a map, a question, etc. For example, if a user provides an input such as “Book appointment for Jane,” the computing system may generate a widget for the “Family” GUI that includes a map with multiple pediatrician locations, and the user may be prompted to clarify which specific pediatrician they would like to book Jane’s appointment at.
- language model module 342 may apply an LLM to the indication of the natural language user input to identify one or more tasks, in which each task from the one or more tasks is associated with a respective category from one or more categories. In some examples, language model module 342 may apply an LLM to the indication of the natural language user input to identify the one or more categories. In some examples, language model module 342 may determine a set of information types included in the input (e.g., text or audio input or a transcription generated by speech-to-text module 226).
- An information type may be or otherwise include a topic, theme, point, subject, purpose, intent, keyword, etc.
- language model module 342 may determine the information type by leveraging a self-attention mechanism to capture the relationships and dependencies between words in the input sequence. For example, language model module 342 may tokenize (e.g., split) a sequence of words or subwords, which language model module 342 may convert into vectors (e.g., numerical representations) that language model module 342 can process. Language model module 342 may use the self-attention mechanism to weigh the importance of each token in relation to the others. In this way, language model module 342 may identify patterns and relationships between the tokens, and in turn the words corresponding to the tokens, that indicate one or more information types of the accessibility information.
- language model module 342 may excel at performing NLP tasks, such as generating text and other content (e.g., new code that provides graphical components and functionality for performing one or more tasks).
- NLP tasks such as generating text and other content (e.g., new code that provides graphical components and functionality for performing one or more tasks).
- specific types of content e.g., specific information types
- language model module 342 may have an increased likelihood of generating false, inaccurate, or bad quality information.
- language model module 342 may be configured to exclude the generation of content or code relating to a set of excluded information types.
- the set of excluded information types may include one or more of phone numbers, addresses, web addresses, functionality prohibited by an application, sensitive data (e.g., full bank account information), etc.
- input information may be passed in language model module 342 with certain prerequisites, prompts, or “rules” that can be stored in rules storage 344.
- Machine learning module 310 may apply these prerequisites, prompts, or rules when generating the set of instructions, or new code, associated with the functionality for performing the identified tasks and subtasks, and the corresponding GUIs and graphical components.
- machine learning module 310 may implement a rule such as, “Do not include user’s sensitive information” when generating instructions for generating a “Transfer Funds” widget that includes pre-populated input (e.g., instead of including a user’s full bank account number, the pre-populated input may include a string such as, “Bank Account ending in 1234”).
- machine learning module 210 may use accessibility information when generating code for GUIs and graphical components, such that the user can easily interact with the GUIs and graphical components.
- the rules may be text inputs such as, for example, “Keep GUI headings short.”
- rules storage 354 may store a plurality of text inputs and/or other data that further specify how instructions file 350 should be generated by machine learning module 310.
- language model module 342 may be applied to the indication of the natural language user input in accordance with the one or more predefined rules stored in rules storage 344, which may include, for example, unauthorized terms, unauthorized class names, unauthorized dimensions of the graphical user interface, unauthorized application functionality, etc. Because language model module 342 can interpret the rules along with the input, the computing system may provide more accurate instructions for generating functionality and associated GUIs and graphical components for performing identified tasks.
- the computing system may be able to interpret natural language to understand user intents, and then write or generate new, robust, working code that satisfies the user intents, can perform calculations, and can render new graphical user interfaces or components at machine speed, etc.
- language model module 342 may be a transformer-based neural network in some examples, in some examples, language model module 342 may be or otherwise include one or more other types of neural networks.
- language model module 342 may be or include an autoencoder.
- the aim of an autoencoder is to learn a representation (e.g., a lower- dimensional encoding) for a set of data, typically for the purpose of dimensionality reduction.
- an autoencoder can seek to encode the input data and then provide output data that reconstructs the input data from the encoding.
- the autoencoder can include additional losses beyond reconstructing the input data.
- machine learning module 310 may minimize how often language model module 342 is invoked by caching the generated set of instructions, or new code, in instructions cache 348.
- language model module 342 may use a prompt including user intent (e.g., the output from speech-to-text module 226 of FIG. 2) and any contextual information received by the computing system. More specifically, in some examples, prior to generating the set of instructions, the computing system may perform “memory injection,” which may be considered a process in which an identified task may be passed to a system that can look up and append additional context to the task.
- identified tasks such as “Send money to Mike” and “Book Jane’s appointment,” may be passed as input to machine learning module 310, in which machine learning module 310 may determine if any relevant context information has been stored by the computing system.
- machine learning module 310 may determine the following relevant context information: “The user has a husband called Mike” and “The user has a daughter aged 3 called Jane.”
- Machine learning module 310 may then include both the user input (e.g., the identified tasks) and the relevant context information in a prompt, which may then be used to generate the set of instructions.
- machine learning module 310 may implement one or more self-prompting or recursive prompting models, e.g., language model module 342 may generate prompts based on retrieved application information, user input, context information, etc., which may involve generating follow-up questions, inferences, or further instructions that can guide subsequent stages of processing.
- language model module 342 may generate prompts based on retrieved application information, user input, context information, etc., which may involve generating follow-up questions, inferences, or further instructions that can guide subsequent stages of processing.
- a prompt may include one or more APIs from API module 206, in which the one or more APIs may then be included in instructions file 350.
- instructions file 350 may include instructions for gathering more specific details or data at runtime (e.g., one or more task APIs may send requests to the one or more applications at runtime). In this way, portions of generated code may be reused.
- machine learning module 310 may be configured to perform instruction embedding in which a representation (i.e., embedding) of frequently used or critical instructions are stored in instructions cache 348.
- instructions file 350 may be generated based on the instructions stored in instructions cache 348 and any additional instructions, information, or updates retrieved by an API at runtime that are not present in instructions cache 348.
- language model module 342 may generate a general set of instructions for rescheduling any meeting on any day and store the instructions in instructions cache 348. If the user provides the same “reschedule today’s meeting” command in the future, language model module 342 may generate instructions file 350 including the cached instructions and an API call that retrieves, e.g., calendar application data pertaining to the future date. Thus, instructions file 350 may provide functionality for rescheduling a specific meeting on the future date.
- machine learning module 310 may reuse the frequently used or critical instructions without having to invoke language model module 342 on data other than what is included in the prompt (e.g., language model module 342 may not have to re-apply the large language model to the information associated with the predefined functions included in the one or more applications).
- the prompt may only include contextual information, and data indicative of user intent may be stored in instructions cache 348.
- machine learning module 310 may apply code caching to both compiled and interpreted languages. Machine learning module 310 may implement various types of caching, such as, for example, Just-In-Time (JIT) compilation, Ahead-Of-Time (AOT) compilation, and bytecode caching.
- JIT Just-In-Time
- AOT Ahead-Of-Time
- machine learning module 310 may generate instructions file 350 using language model module 342, in which instructions file 350 may be generated based on one or more of application functionality, capabilities, and/or attributes included in retrieved application information, contextual information (e.g., user data), the natural language audio or text input received by the computing system, and/or the transcribed text output from a speech-to-text module.
- language model module 342 in which instructions file 350 may be generated based on one or more of application functionality, capabilities, and/or attributes included in retrieved application information, contextual information (e.g., user data), the natural language audio or text input received by the computing system, and/or the transcribed text output from a speech-to-text module.
- a prompt may be generated by machine learning module 310, in which the prompt may specify output format, allowed data types, a UI component library that can be used to build end result UI, an API library including APIs that can be used to retrieve data from the applications at runtime, user input (e.g., the identified tasks), and context information.
- the prompt may then be provided to language model module 342 as input, in which language model module 342 may then generate instructions file 350 that includes code for accessing relevant device APIs and returning relevant UI components that provide functionality for performing tasks.
- the prompt(s) used to generate instructions file 350 may be used by machine learning module 310 to determine whether a user’s desired application functionality or “new” application functionality is possible or within reason (e.g., a task widget may not be associated with functionality for transferring funds if the user does not have a banking application downloaded on their user device).
- the set of instructions, and/or the “generated,” “desired,” or “new” functionality described herein may be defined as functionality or code that is dynamically generated by machine learning module 310 on the basis of the retrieved information associated with predefined application functionality, user input, and/or other information retrieved from a user computing device.
- the at least one function for performing a respective task may include a combination of data and/or predefined application functionality retrieved from different applications.
- the graphical components, e.g., task widgets, associated with the at least one function for performing a respective task may provide a “shortcut” for completing the respective task. For example, instead of requiring a user to navigate through a messaging application to find out how much money the user needs to send to Mike, and then having to navigate through a banking application to search for Mike’s banking account username, initiate a new transfer, and manually enter in all relevant information, a single task widget may provide the user the functionality for performing all of the aforementioned actions, e.g., automatically, or with one or more simple clicks or interactions with the task widget.
- instructions file 350 may include the instructions for generating the at least one GUI associated with a respective category, in which the at least one GUI includes at least one graphical component associated with at least one function for performing a respective task. Instructions file 350 may also be stored in a memory of the computing system such that instructions file 350 can be resent, updated, or sent to a computing device associated with the user, one or more other computing devices associated with the user, and/or, in some examples, with explicit consent from the user, one or more other computing devices associated with one or more other users. As an example, in some examples, the computing system may receive, from a computing device, a request to send instructions file 350 to a companion device associated with the computing device, in which the computing system may then send, to the companion device, instructions file 350.
- instructions file 350 may include all data collected or used by the computing system to generate instructions file 350.
- instructions file 350 may include details for how the user's natural language was resolved into working code.
- users may be able to view or “inspect” instructions file 350.
- a user may be provided various controls to clarify, inspect, or stop a task to ensure that the computing system is following the user’s intent.
- the generated GUIs and/or graphical components may be inspectable, in which users can, for example, interact with widgets to see the associated data, code or instructions (e.g., instructions file 350), or pinch to expand widgets to reveal more controls.
- a user may be able to edit instructions file 350.
- a user may edit the intent or parameters used by machine learning module 310, and instructions file 350 may be updated to reflect the edits.
- users may interact with the GUIs and/or graphical components to add or delete GUIs and/or graphical components, directly edit parameters, edit the order of the GUIs, the positioning of the graphical components, change, add, or delete visual effects, etc.
- any predetermined or suggested input determined by machine learning module 310, the functionality generated by machine learning module 310, the GUIs and graphical components, and any other data included in instructions file 350 may be customizable or user-configurable.
- the user interface generation provided by the computing system may require less time and/or effort to create new functionality and/or graphical user interfaces and components for performing a user’s identified tasks. That is, instead of users having to remember multiple tasks and navigate through multiple applications and user interfaces to access relevant information and functionality for performing their multiple tasks, the techniques of this disclosure may provide users the ability to quickly have their tasks organized into GUIs by simply providing natural language input. Furthermore, the organized GUIs may further provide users the ability to quickly perform their tasks, as doing so may only require a user to simply interact with a single widget.
- FIG. 4 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- Computing system 400 may be similar if not substantially similar to computing system 100 of FIG. 1 and computing system 200 of FIG. 2.
- Computing device 412 may be similar if not substantially similar to computing device 112 of FIG. 1.
- User interface (UI) components 402 may be similar if not substantially similar to UI components 102 of FIG. 1.
- Network 401 may be similar if not substantially similar to network 101 of FIG. 1.
- some or all of the techniques described with respect to computing system 400 may be implemented locally on computing device 412.
- computing system 400 may retrieve, using API module 406, information associated with predefined functions included in one or more applications executing at computing device 412. With explicit consent from the user, computing system 400 may also retrieve, using API module 406, other data and/or context information from computing device 412, such as historical user data, device data, user activity data, etc.
- UI module 404 may receive, from computing device 412, an indication of a natural language input such as “Send money to Mike, book Jane’s appointment, plan trip with John. . in which the input is associated with one or more predefined functions included in the one or more applications.
- sending money, selecting a recipient for the money, booking an appointment, purchasing a flight, browsing the Internet, sending a message, selecting a recipient for the message, etc. may be examples of predefined functionality for a banking application, a healthcare application, an airline application, a web browser application, and a messaging application executing on computing device 412.
- machine learning module 410 may apply a language model to the indication of the example natural language user input to identify one or more tasks, in which each task from the one or more tasks is associated with a respective category from one or more categories. For example, machine learning module 410 may identify a first task, “Send money to Mike,” a second task, “Book Jane’s appointment,” and a third task, “Plan trip with John.” Machine learning module 410 may determine, based on the retrieved information, data, and/or context information from computing device 412, that the first and second tasks are associated with a “Family” category. More specifically, in this example, data retrieved from computing device 412 may indicate that Mike and Jane are family members of the user, and that Jane is a child.
- a text message received from Mike R. that states, “Can you send me $20?”, and another text message received from another family member that states, “Can you book Jane’s doctor’s appointment for next week?” may further provide additional context information for determining user intent.
- computing system 400 may determine, based on the text message, that the “Mike” referred to in the user’s input is specifically Mike R., and not Mike B., Mike C., or Mike L., as the text message was received from Mike R.
- machine learning module 410 may determine that performing the first task requires functionality for sending $20 from the user’s preferred bank account to Mike R.’s banking account.
- Machine learning module 410 may determine that performing the second task requires functionality for booking an appointment for Jane at a local pediatrician next week at a time outside of the user’s scheduled meetings. Therefore, performing each task may require functionality from multiple different applications.
- UI generator module 408 may apply, using the information associated with the plurality of functions, machine learning module 410 to the first task and the second task to generate a set of instructions.
- the set of instructions may be considered dynamic and may be generated at runtime based on user input and retrieved information.
- the set of instructions may combine data retrieved from the one or more applications, such that a user may complete a task without having to navigate through the multiple associated applications.
- the set of instructions may provide at least one function for performing a respective task from the one or more tasks.
- the set of instructions may provide at least one function for sending $20 from the user’s preferred bank account to Mike R.’s banking account, and at least one function for booking an appointment for Jane at a local pediatrician next week at a time outside of the user’s scheduled meetings.
- the at least one graphical user interface associated with the respective category includes one or more of at least one graphical component including text data associated with the respective category, at least one graphical component including text data associated with information from the one or more applications, at least one graphical component associated with one or more suggested inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task.
- the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface.
- the set of instructions may include instructions for generating GUI 417 associated with the “Family” category, which is demonstrated in FIG. 4 with text data 451 (“FAMILY”), which may be considered a GUI header.
- GUI 417 associated with the “Family” category includes widget 452, titled “Send Money to Mike,” which is associated with the at least one function for sending $20 from the user’s preferred bank account to Mike R.’s banking account.
- widget 452 includes pre populated text entry fields, such as “pay” text entry field 453 that is prepopulated with an input of “$20.00,” “from” text entry field 447 that is prepopulated with an input of “Checking Acct 1234,” and “to” text entry field 449 that is prepopulated with an input of “Mike R.”
- widget 452 includes “Send” button 454, which may be configured to provide the generated functionality that sends $20 from the user’s bank account ending in 1234 to Mike R.’s banking account upon the user interacting with “Send” button 454.
- GUI 417 may include widget 455 titled “Book Jane’s Appointment,” which is associated with the at least one function for booking an appointment for Jane at a local pediatrician next week at a time outside of the user’s scheduled meetings.
- machine learning module 410 may not have enough data to determine the user’s intent with high confidence. Therefore, as shown in the example of FIG. 4, widget 455 includes text prompt 443 “Which pediatrician?”, map 445 sselling the locations of pediatrician A, pediatrician B, and pediatrician C, and one or more suggested inputs, which are shown as buttons 456 that each correspond to a specific pediatrician.
- the set of instructions include instructions for prompting the user to clarify which suggested pediatrician they would like to book Jane’s appointment at. Responsive to receiving input indicative of a selection of a suggested input from the one or more suggested inputs (e.g., selecting the button that corresponds to Pediatrician A), computing system 400 may update widget 455.
- UI generator module 408 may generate instructions for generating GUI 417 that provides the user the ability to perform their tasks in a quick and organized manner, e.g., by simply interacting with widget 452 and widget 455. As such, the user may find it easier to complete tasks, and may enjoy an overall improved user experience.
- FIG. 5 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- Computing system 500 may be similar if not substantially similar to computing system 100 of FIG. 1, computing system 200 of FIG. 2, and computing system 400 of FIG. 4.
- Computing device 512 may be similar if not substantially similar to computing device 112 of FIG. 1 and computing device 412 of FIG. 4.
- User interface (UI) components 502 may be similar if not substantially similar to UI components 102 of FIG. 1 and UI components 402 of FIG. 4.
- Network 501 may be similar if not substantially similar to network 101 of FIG. 1 and network 401 of FIG. 4.
- some or all of the techniques described with respect to computing system 500 may be implemented locally on computing device 512.
- FIG. 5 includes widget 557 that indicates the first task of sending $20 to Mike was completed (e.g., as shown, widget 557 may include a check mark), and widget 558 that indicates the second task of booking Jane’s appointment was completed (e.g., as shown, widget 558 may include a check mark).
- computing system 500 may update the set of instructions to include instructions for generating updated graphical components, in which the updated graphical components indicate that the respective task was performed.
- “Family” GUI 517 may include widget 559 titled “Order Jersey,” which may be generated based on, for example, context information retrieved from computing device 512, such as a text message received that states, “Can you order a jersey for Jack?”
- the text message and other context information e.g., historical user data, data retrieved from the one or more applications executing at computing device 512, etc.
- computing system 500 may further determine that Jack is another family member of the user, a child, has a preference for a specific football team, wears a specific size, etc., and that the user has historically preferred to purchase similar items within a specific price range.
- computing system 500 may be configured to determine tasks that are not explicitly included in an indication of a natural language user input, but rather determined based on the context information retrieved from computing device 512.
- the set of instructions may include instructions for generating the category GUIs on a basis of a level of importance
- the graphical components included in each GUI may also be generated on a basis of a level of importance or priority. For example, in this example, because the indication of the natural language user input explicitly included the tasks of sending $20 to Mike and booking an appointment for Jane, the widgets associated with each of those tasks may be displayed first, e.g., at a top portion of GUI 517, because those tasks may be assigned a higher level of priority.
- GUI 559 associated with the task of ordering Jack a jersey which is a task that was not explicitly included in the indication of the natural language user input, may be assigned a lower level of priority, and therefore may be displayed last, e.g., at a bottom portion of GUI 517.
- GUIs and/or graphical components described herein may be scrollable, such that, e.g., GUI 517 may include any number of graphical components corresponding to functionality for performing one or more tasks.
- machine learning module 510 may determine that the task of ordering Jack a jersey may require functionality from multiple different applications, such as functionality for browsing the web, functionality for determining Jack’s preferred size, functionality for completing a transaction, etc. Therefore, the set of instructions may provide at least one function for, e.g., purchasing a youth size small jersey for a specific football team that is within a price range of $5.00-$15.00.
- parameters involved for performing a task may be pre-populated by UI generator module 508, e.g., based on context information, historical user data, application information, etc. retrieved from computing device 512. As shown in the example of FIG.
- widget 559 may display one or more suggested inputs, such as a suggested jersey for the specific football team chosen based on a suggested size parameter of “Youth Size S” and a suggested price parameter of “$9.99.” Widget 559 may further include “Buy” button 560, which a user may interact with to perform the task of ordering the jersey. Furthermore, as shown, widget 559 may include one or more user-configurable controls, such as “Price” slider 561, which a user may interact with to set a preferred price range.
- computing system 500 may update the set of instructions to include instructions for generating an updated widget 559 that includes functionality for performing the task with the updated parameters. Additionally, as shown, widget 559 may be scrollable, such that a user may swipe horizontally to discover other suggested jerseys for sale.
- computing system 500 may provide users the ability to perform tasks, even when a user may not be “thinking of’ the tasks, e.g., may not explicitly say their intent for completing such tasks.
- the graphical components, e.g., widgets, associated with the generated functionality for performing tasks may be positioned on a category GUI based on a level of priority, such that users may be presented with higher priority task widgets first, and may complete their tasks in a more organized manner.
- FIG. 6 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- Computing system 600 may be similar if not substantially similar to computing system 100 of FIG. 1, computing system 200 of FIG. 2, computing system 400 of FIG. 4, and computing system 500 of FIG. 5.
- Computing device 612 may be similar if not substantially similar to computing device 112 of FIG. 1, computing device 412 of FIG. 4, and computing device 512 of FIG. 5.
- User interface (UI) components 602 may be similar if not substantially similar to UI components 102 of FIG. 1, UI components 402 of FIG. 4, and UI components 502 of FIG. 5.
- Network 601 may be similar if not substantially similar to network 101 of FIG. 1, network 401 of FIG. 4, and network 501 of FIG. 5.
- some or all of the techniques described with respect to computing system 600 may be implemented locally on computing device 612.
- computing system 600 may update the set of instructions to include instructions for generating updated graphical components that indicate a respective task was performed, display relevant information pertaining to the completed task, and/or provide functionality for performing the task again, e.g., with different parameters. For example, responsive to the user selecting button 456 of FIG. 4 that corresponds to Pediatrician A, computing system 600 may update the set of instructions to include instructions for generating widget 661 in place of widget 455.
- widget 661 may provide information pertaining to the completed task of booking Jane’s appointment, such as the date (e.g., “Friday Oct 15, 2024”), a summary of the event (“Jane at Doctor’ s”), time (“1 : 10 PM-l :30 PM”), and specific pediatrician (“Pediatrician A”) for which the appointment was booked. That is, widget 661 may provide a “reminder” to the user about Jane’s appointment, and, as shown, may be displayed at a top portion of GUI 617, as widget 661 may be assigned a higher level of priority than, for example, widget 659 (which may be similar if not substantially similar to widget 559 of FIG. 5).
- widget 661 may include “Call Doctor” button 662, which may be associated with at least one function for performing a task of rescheduling Jane’s doctor’s appointment.
- machine learning module 610 may intuitively determine additional actions or subtasks associated with a task that may be performed after the task is completed. In this way, users may not be required to explicitly provide input indicating their desire to perform such additional actions or subtasks; instead, computing system 600 may automatically determine the additional actions or subtasks, e.g., based on information retrieved from computing device 612, and automatically generate graphical components associated with functionality for completing the additional actions or subtasks.
- a user may provide one or more additional indications of a natural language input, from which machine learning module 610 may identify one or more tasks associated with and/or not associated with the one or more predetermined categories.
- the user may provide an additional indication of a natural language input that includes the utterance, “How many days until the kids start school?”
- UI generator module 608 may apply machine learning module 610 to this additional indication and determine a task of determining how many days there are until the user’s children start their next school year, which machine learning module 610 may further determine to be associated with the “Family” category.
- UI generator module 608 may generate a set of instructions that include instructions for generating widget 663, titled “Days until school starts,” and may include functionality for counting down the days until the user’s children start their next school year (e.g., based on information retrieved from a calendar application, etc.).
- some graphical components may not require a user to interact with the graphical components to perform a particular task. That is, in some examples, one or more graphical components may simply provide relevant information, reminders, notes, etc. that may answer a user’s query or intent.
- FIG. 7A is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- Computing system 700 may be similar if not substantially similar to computing system 100 of FIG. 1, computing system 200 of FIG. 2, computing system 400 of FIG. 4, computing system 500 of FIG. 5, and computing system 600 of FIG. 6.
- Computing device 712 may be similar if not substantially similar to computing device 112 of FIG. 1, computing device 412 of FIG. 4, computing device 512 of FIG. 5, and computing device 612 of FIG. 6.
- User interface (UI) components 702 may be similar if not substantially similar to UI components 102 of FIG. 1, UI components 402 of FIG. 4, UI components 502 of FIG.
- Network 701 may be similar if not substantially similar to network 101 of FIG. 1, network 401 of FIG. 4, network 501 of FIG. 5, and network 601 of FIG. 6. Furthermore, some or all of the techniques described with respect to computing system 700 may be implemented locally on computing device 712.
- the indication of a natural language user input may include an identified task such as, “Plan the trip with John,” which may involve multiple actions or subtasks to complete.
- machine learning module 710 may determine, based on the indication of the natural language input, a “Trip” category, in which the set of instructions may include instructions for generating GUI 766 associated with the “Trip” category (demonstrated in FIG. 7A with text data 765 (“TRIP”), which may be considered a GUI header).
- One example subtask determined by machine learning module 710 may be a subtask of booking an accommodation. As shown in the example of FIG.
- GUI 766 may include widget 767A, titled “Book Accommodation,” which may include one or more suggested accommodations based on one or more suggested input parameters.
- the at least one graphical component (e.g., widget 767A) generated by the set of instructions includes a first graphical component and a second graphical component, in which the first graphical component is associated with a first function for performing a respective task, and the second graphical component is associated with a second function for performing the respective task.
- widget 767A may include additional graphical components, e.g., “sub -widgets,” such as sub-widget 768 titled, “Budget” that includes slider 769, and sub-widget 770 titled, “Distance” that includes draggable circle (i.e., radius selector) 771, in which slider 769 and draggable circle 771 may be considered user-configurable controls that provide functionality for tasks such as setting a price range and setting a location radius.
- subwidget 768 may display a suggested price range
- sub-widget 770 may display a suggested location and search radius, in which the suggested parameters may result in one or more suggested graphical components.
- suggested parameters resulted in suggested sub-widget 781 corresponding to a suggested “Cottage” accommodation, suggested sub-widget 782 corresponding to a suggested “Hotel” accommodation, and suggested subwidget 783 corresponding to a suggested “Family Home” accommodation.
- the user may interact with slider 769 to adjust the price range parameters, and may interact with draggable circle 771 to adjust the location and radius for the search, in which computing system 700 may then update the set of instructions to include instructions for generating an updated widget 767B (shown in FIG. 7B) that displays suggestions based on the updated parameters.
- FIG. 7B is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- widget 767B includes sub-widget 772 titled, “Vibes,” which further includes at least one keyword, such as “Quiet” keyword 773, “Unique” keyword 784, and “Cozy” keyword 785, and at least one user-configurable control, such as draggable circles 774 that each correspond to a keyword.
- the suggested sub-widgets 781, 782, and 783 each corresponding to a suggested accommodation may be based on one or more of the at least one keyword and at least one user-configurable control. That is, as shown in the example of FIG. 7B, the suggested subwidgets 781, 782, and 783 each corresponding to a suggested accommodation may be based on the user-configurable budget parameter set by sub-widget 768 and user-configurable distance parameter set by sub-widget 770 (which in this example, may be collapsible/expandable widgets), keywords 773, 784, and 785, and user-configurable controls corresponding to a keyword, such as draggable circles 774.
- a keyword such as draggable circles 774.
- sub-widget 772 may be considered a “tension triangle” configured to alter parameters based on user interaction with the triangle.
- computing system 700 may receive an indication of a user input associated with one or more of the at least one keyword and at least one user-configurable control, e.g., the user may interact with a draggable circle 774 to slide the draggable circle 774 closer or farther away to corresponding “Quiet” keyword 773.
- a user may indicate a level of importance for each keyword based on the distance from the keyword in which user-configurable control (e.g., the draggable circle) is set.
- computing system 600 may receive the indication of this user input, and update, based on the indication of the user input, the at least one suggested graphical component. That is, computing system 600 may update the set of instructions to include instructions for generating one or more suggested graphical components that each correspond to an updated suggestion.
- suggested subwidgets 781, 782, and 783 each corresponding to a suggested accommodation may be removed, replaced and/or updated based on the updated level of importance assigned to keywords 773, 784, and 785.
- the user may edit the keywords, e.g., by tapping on the graphical component associated with a keyword to change the keyword to a different keyword, by typing a different keyword, and/or providing additional natural language input such as, e.g., “Find me accommodations that are modern, minimalist, and spacious.”
- Computing system 600 may then replace suggested sub-widgets 781, 782, and 783 with suggested sub-widgets that each correspond to a new suggested accommodation determined to be associated with the new keywords.
- GUIs and graphical components generated from the set of instructions may be associated with generated functionality that can help complete a user’s tasks in a “shortcut” manner
- GUIs and graphical components can be fine-tuned and customized by the user, in that the user can quickly and easily change suggested input parameters for completing their tasks.
- design and layout of the category GUIs and graphical components e.g., widgets
- FIG. 8 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
- Computing system 800 may be similar if not substantially similar to computing system 100 of FIG. 1, computing system 200 of FIG. 2, computing system 400 of FIG. 4, computing system 500 of FIG. 5, computing system 600 of FIG. 6 and computing system 700 of FIG. 7.
- Computing device 812 may be similar if not substantially similar to computing device 112 of FIG. 1, computing device 412 of FIG. 4, computing device 512 of FIG. 5, computing device 612 of FIG. 6, and computing device 712 of FIG. 7.
- User interface (UI) components 802 may be similar if not substantially similar to UI components 102 of FIG.
- computing system 800 may receive one or more of an additional indication of a user input and context information from the one or more applications, in which computing system 800 may update, based on one or more of the additional indication of a user input and the context information, the at least one graphical user interface. For example, in the example of FIG.
- GUI 866 (which may be similar if not substantially similar to GUI 766 of FIGS. 7A and 7B), in which GUI 866 includes widget 876 that displays a message from John (“JR”) saying “I’ve bought our flights.” That is, in some examples, the set of instructions including instructions for generating GUI 866 and one or more graphical components may be updated based on new context information retrieved from computing device 812. For example, computing system 800 may retrieve information related to the message received from John R., and may generate instructions for generating widget 876. Furthermore, computing system 800 may generate widget 877 titled, “Days until trip,” which may display the number of days until the user’s trip starts.
- computing system 800 may update the set of instructions to include instructions for generating widget 878, titled “Cottage Booked,” which may replace widget 767A of FIG 7A and/or widget 767B of FIG. 7B.
- widget 878 may provide information pertaining to the completed task of booking the cottage accommodation, such as the dates (e.g., “Nov 1st - Nov 7th”), the address of the accommodation (“123 Fifth Street”), and other relevant information, such as the check-in time (“Check-In at 12 PM”).
- GUI 866 may include text summary 875 generated by machine learning module 810, which may provide a summary of the tasks and/or any retrieved information associated with the category, such as tasks or subtasks that have been completed, tasks or subtasks that have not been completed, relevant information retrieved from computing device 812, etc.
- text summary 875 includes the sentences, “Flights and accommodations have been booked.
- computing system 800 may continuously or periodically retrieve information from computing device 812, such as existing functionality and context information (e.g., received messages, notifications, etc.) from applications executing on computing device 812, and generate the set of instructions to include instructions for generating new or updated GUIs and graphical components that are based on the retrieved information.
- computing system 800 may prevent users from performing subtasks that have already been completed (e.g., computing system 800 may prevent the user from purchasing airline tickets when John has already purchased them).
- computing system 800 may generate suggested graphical components associated with functionality for performing subtasks that computing system 800 determines to be relevant to a larger task. For example, in the example of FIG. 8, computing system 800 may generate a set of instructions that includes instructions for generating suggested widget 880 titled “Restaurants in the area,” which may be associated with functionality for booking a dinner reservation at a restaurant in a location that is close to the address for the booked cottage accommodation. As shown, suggested widget 880 may be “grayed-out,” that is, suggested widget 880 may be a suggested widget that computing system 800 has determined to be associated with a lower level of priority.
- Suggested widget 880 may be displayed at a bottom portion of GUI 866, and the functionality of suggested widget 880 may only be implemented responsive to a user interacting with suggested widget 880 so as to “accept” the suggestion (e.g., the user may click on suggested widget 880 to activate or enable suggested widget 880).
- suggested widget 880 may be a scrollable widget that includes one or more suggested sub-widgets each corresponding to a suggested restaurant, in which the one or more suggested restaurants may be determined by computing system 800 based on information retrieved from computing device 812 (e.g., historical user data or preferences that indicate a user’s preferred genre of food, restaurant rating, budget, etc.).
- computing system 800 may identify tasks based on the user intents, and computing system 800 may generate instructions for generating categorized GUIs, in which the categorized GUIs include organized task widgets that are prepopulated with suggested input and provide functionality for performing the identified tasks. Furthermore, computing system 800 may utilize information retrieved from computing device 812 to update, finetune, and intuitively determine the GUIs and task widgets for completing a user’s tasks. In this way, the techniques described herein may reduce the mental load, complexity, and time required for users to complete various tasks, and therefore may provide an overall improved user experience when operating user devices.
- FIG. 9 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality to a companion device, in accordance with one or more techniques of this disclosure.
- a user 920 interacts with computing device 912 that is in communication with computing system 900.
- computing device 912 may be similar if not substantially similar to computing system 100 of FIG. 1 and computing system 200 of FIG. 2.
- Computing device 912 may be similar if not substantially similar to computing device 912 of FIG. 1.
- GUI 916 may be similar if not substantially similar to GUI 116 of FIG. 1.
- User interface (UI) components 902 may be similar if not substantially similar to UI components 102 of FIG. 1 and UI components 202 of FIG. 2.
- Network 901 may be similar if not substantially similar to network 901 of FIG. 1.
- companion device 981 includes UI components 982.
- companion device 981 is in communication with computing system 900 and computing device 912 via network 901.
- GUI 916 may include a number of application widgets, such as application widgets 9I5F-915I.
- the techniques described herein may also provide users a “shortcut” to desired application functionality for a single application.
- Computing system 900 may be configured to generate instructions for performing an action that can be performed via, for example, a touch and talk feature (described in more detail below), rather than by the user navigating through the application. For example, a user may provide an input such as “Put a star emoji next to Thursday’s meeting,” while holding down on a calendar app widget.
- Many applications and GUIs executed on computing devices are often limited by design space. That is, in the instance of the calendar application, multiple screens and/or user interface components may be required to provide all of the application’s functionality.
- computing system 900 may generate instructions for performing this action automatically on behalf of the user. In this way, users may not be required to navigate through applications to find their desired functionality or perform specific actions. Instead, users may simply interact with a single application widget and access their desired functionality and/or have their desired actions performed immediately. That is, the techniques described herein may provide user 920 with a mechanism to “shortcut” the complexity of various actions.
- computing system 900 may retrieve, using API module 906, a first set of instructions (e.g., API response data) associated with a first plurality of functions included in an application.
- a first set of instructions e.g., API response data
- the “first plurality of functions” may be functions, or “functionality”, e.g., capabilities or features of an application, that are provided by the values, settings, or other data that are directly embedded into the source code of an application, rather than those that are dynamically generated or configurable at runtime.
- the “first plurality of functions” may include functionality provided by values, logic, etc. that are fixed, e.g., “hard-coded”, in an application’s source code, and cannot be easily changed without modifying the code itself.
- first plurality of functions may be considered statically defined functions, or functions that are predefined at compile time or build time and do not change during execution.
- the “first set of instructions associated with a plurality of functions” described herein may refer to information, data, etc. that can be retrieved, e.g., via an API, from one or more applications installed on a computing device, such as computing device 912.
- Computing system 900 may also receive, from computing device 912, and provided that user 920 has given explicit consent, an indication of a natural language user input (e.g., audio or text input from user 920) associated with one or more functions from the first plurality of functions included in the application.
- a natural language user input e.g., audio or text input from user 920
- the indication of a natural language user input may represent user 920’ s command or desired functionality for the application.
- computing system 900 may apply machine learning module 910 to the received natural language user input to generate a second set of instructions (e.g., code), that includes instructions for generating a corresponding GUI, graphical component, and/or the user’s desired functionality for the application.
- a second set of instructions e.g., code
- the “second set of instructions” may be dynamically generated at runtime based on user input and retrieved information, including data associated with the predefined or statically defined functions, capabilities, or features from the one or more applications. That is, the second set of instructions may be associated with one or more functions from a second plurality of functions.
- the “second plurality of functions” may be considered dynamically generated or configurable functions that may adapt or change based on input data and/or other conditions at runtime.
- the “second plurality of functions” may be considered to be “included in” one or more applications, in that the second plurality of functions may be based on the first plurality of functions, and are determined to be possible functions for the one or more applications (e.g., the second plurality of functions may not include functions for performing a funds transfer if no banking applications are installed.).
- the second set of instructions may be considered dynamically generated code that provides corresponding GUIs, GUI components, and/or application functionality based on user input.
- computing system 900 may generate new code that provides user 920’ s desired functionality, so long as the desired functionality is determined to be a possible functionality for the application (e.g., machine learning module 910 may determine whether the desired functionality is reasonable for the application, and/or computing system 900 may determine whether an API request can return information required for the desired functionality).
- Computing system 900 may then send the second set of instructions to computing device 912, in which the computing device may then use the second set of instructions to generate a corresponding GUI component (e.g., a widget) on GUI 916 as well as provide the user’s desired functionality for the application.
- a corresponding GUI component e.g., a widget
- the corresponding GUI component may be or include at least one graphical component associated with the one or more functions from the second plurality of functions. That is, the corresponding GUI component (e.g., widget) may include graphical components and/or graphical elements that are associated with or provide user 920’ s desired functionality.
- computing system 900 may retrieve, using API module 906, a first set of instructions associated with a first plurality of functions included in a weather application (e.g., represented by widget 915F) executing at computing device 912.
- Computing system 900 may then receive an indication of a natural language user input that is associated with one or more functions from the first plurality of functions included in the weather application (e.g., user 920 may provide a voice input to computing device 912 such as “Current temperature” when interacting with (e.g., pressing down on) widget 915F).
- Computing system 900 may then apply, using the first set of instructions, machine learning module 910 to the “Current temperature” input to generate a second set of instructions associated with one or more functions from a second plurality of functions (e.g., functions that are not statically defined or predefined at compile time or build time for the weather application, such as functions for displaying information via a new GUI or graphical component).
- the second set of instructions may include instructions for generating a GUI or a GUI component to display the current temperature, such as widget 984.
- Computing system 900 may then send, to computing device 912, the second set of instructions.
- widget 984 may be displayed on GUI 916 and provide the current temperature to user 920 without user 920 having to navigate through the larger weather application.
- widget 983 may have been generated based on the predefined functions in a banking application represented by widget 915G and user 920 providing an input such as “Show checking account balance.”
- computing system 900 generated instructions for generating widget 983 and providing user 920’ s desired functionality.
- widget 985 may have been generated based on the predefined functions in an Internet browser application represented by widget 915H and user 920 providing an input such as “Convert cups to milliliters.”
- computing system 900 generated instructions for generating widget 985 and providing user 920’ s desired functionality.
- new widgets 983, 984, and 985 may be saved or presented on GUI 916 of computing device 912 for future use.
- computing system 900 may generate instructions for a new GUI that includes at least one graphical component associated with the one or more functions from the second plurality of functions. That is, in some examples, new widgets 983, 984, and 985 may be saved or presented on a new GUI that is different from GUI 916.
- computing system 900 generates instructions for a GUI component (e.g., a widget) associated with one or more suggested natural language user inputs.
- the one or more suggested natural language user inputs may be based on one or more historical natural language user inputs.
- the natural language user input may be provided via a “touch and talk” feature.
- user 920 may hold down on widget 915G (which represents a banking application) with their finger, which is a gesture that may correspond to a user interface component 902 (e.g., a microphone) of computing device 912.
- GUI component 993 While holding down on widget 915G, and without opening the banking application, user 920 may also be presented with GUI component 993, which may be considered a “pop-up widget” that displays suggested inputs or commands for the banking application.
- a pop-up widget is typically designed to be temporary and overlay the existing content on a screen.
- GUI component 993 may disappear.
- GUI component 993 includes suggested input “Send $20 to Jane. . .”
- the suggested input or inputs provided by GUI component 993 may be based on one or more capabilities provided by the application (e.g., transferring funds from one bank account to another, paying a bill, etc.).
- the suggested input may represent one or more functions from the first plurality of functions included in an application.
- the suggested input or inputs may be based on user 920 providing such inputs previously (i.e., based on one or more historical natural language user inputs) when executing or interacting with the banking application.
- the suggested input may be based on actions frequently performed by a user. For example, if a user frequently navigates through multiple screens to check their account balance, the suggested input may include an input that results in the generation of widget 983.
- computing system 900 may generate a second GUI that includes at least one graphical component associated with one or more suggested natural language user inputs. For example, instead of a pop-up widget, computing system 900 may generate instructions for an overlay GUI that displays one or more suggested natural language user inputs.
- a microphone or other UI component 102 on computing device 912 may capture the input and send, to computing system 900, the input as the indication of the natural language user input.
- computing system 900 may receive the indication of the natural language user input with context information, e.g., the input received by computing system 900 may further indicate that user 920 was holding down on the banking application widget 915G. As such, computing system 900 may more accurately determine user 920’ s intent of transferring funds from their bank account.
- the “touch and talk” feature may “decouple” an application’s user interface from the application’s functionality and capabilities.
- widget 915G which represents the banking application
- they may provide an input associated with a desired application function or capability that is not statically defined in the application’s source code or predefined at compile time or build time.
- a user is not restricted to providing inputs associated with functionality and capabilities already provided by application developers, and users can customize applications so long as their desired functions or capabilities are within reason (e.g., as determined by computing system 900) or adhere to a set of rules pertaining to the application (e.g., while a user may request a new GUI and/or graphical component to be generated from the banking application that provides functionality for transferring funds to a contact, the user may not be able to request a new GUI and/or graphical component to be generated from the banking application that provides functionality for making a phone call).
- Various aspects of the techniques described in this disclosure may facilitate better user experience with applications executing on user devices. For example, rather than a user having to navigate through multiple user interfaces in an application to access their desired information or functionality, a user may simply touch the main widget for the application, say their intent or command, and the computing device may provide a new widget that displays the desired information and/or provides the desired functionality. As such, the techniques described may provide more assistance to users when interacting with devices and applications, and may improve overall user experience when interacting with devices and applications. Furthermore, provided that the techniques described include generating new code based on user intent, users may be able to personalize the functionality of applications with which they interact without requiring a developer of the application to actually add features or otherwise update the application.
- API module 906 may retrieve a first set of instructions (e.g., API response data, etc.) from an application executing on computing device 912, which user interface generator module 908 may interpret in order to understand the functionality provided by the application. Interface generator module 908 may further use the first set of instructions and other device information (e.g., user interaction information) to contextualize the indication of a natural language user input when applying machine learning module 910. For example, continuing with the banking example above, interface generator module 908 may receive the “Send $20 to Jane. . .” input string in addition to a first set of instructions associated with the funds transfer capabilities included in the banking application. As such, machine learning module 910 may receive more context for the user input and thus more accurately interpret the user input.
- a first set of instructions e.g., API response data, etc.
- Interface generator module 908 may further use the first set of instructions and other device information (e.g., user interaction information) to contextualize the indication of a natural language user input when applying machine learning module 910. For example, continuing with the banking
- computing system 900 may determine which accessibility actions are frequently performed by user 920 when interacting with a GUI or application such that the new GUIs and/or GUI components generated by user interface generator module 908 can be better tailored for user 920’ s needs. For example, in the case where user 920 is unable to provide a text input, user interface generator module 908 may generate instructions for a GUI such as widget 985 that provides user 920 the functionality of widget 985 when user 920 provides a voice command such as “Convert 12 cups to milliliters.”
- API module 906 may retrieve information pertaining to every element included in a GUI, such as GUI 916, no matter the type of element.
- the first set of instructions may include accessibility information.
- the accessibility information may be associated with a “view hierarchy” of a GUI of the application executing at the computing device, wherein the GUI may be represented as a tree of GUI views. In some examples, this hierarchy may demonstrate a hierarchy of information presented via a GUI, such as a category, subcategory, and sub-subcategory.
- the first set of instructions may include information associated with a plurality of user interface elements included in the application.
- computing system 900 may retrieve information associated with the plurality of user interface elements included in GUI 916 via API module 906, wherein the information includes one or more of a node type, textual content associated with a node, an action that can be performed on a node, a relationship between one or more nodes, or a plurality of accessibility features included in a node.
- interface generator module 908 may use this information to determine the format, size, color scheme, accessibility features, or any other features to include in the second set of instructions (e.g., new code) for generating a new GUI component (e.g., new customized widget) and functionality for an application.
- computing system 900 may also provide users the ability to configure various accessibility and/or display options according to their needs. For example, a user may be able to adjust the user interface components of GUI 916, such as text size, enable color correction, set up magnification gestures, and configure gesture-based navigation for GUI 916.
- a user may also edit or update the desired functionality and new GUI and/or GUI component for an application.
- computing system 900 may receive an updated natural language user input (e.g., a user may provide a voice command such as “Show three most recent transactions instead” to edit widget 983).
- Computing system 900 may then apply machine learning module 910 to the updated natural language user input to update the second set of instructions (e.g., to update instructions file 350 of FIG. 3C), wherein the second set of instructions then includes instructions for generating an updated GUI component (e.g., an updated widget 983 that shows the user’s three most recent transactions instead of their checking balance).
- Computing system may then send the updated second set of instructions to computing device 912 to display the updated GUI component and functionality to the user via GUI 916.
- the updated second set of instructions may include instructions for generating an updated GUI (e.g., in examples in which a new GUI was generated based on the natural language user input). Additionally, as described above, in some examples, the user may be prompted to clarify their intent if the natural language user input is unclear.
- computing system 900 may generate an intermediate set of instructions for generating a GUI component or prompt with which the user may interact to clarify their intent or input. Responsive to receiving the clarified natural language user input, computing system 900 may then generate the updated set of instructions.
- the second set of instructions for generating the new GUI component and application functionality may be shared between users.
- computing system 900 may receive, from computing device 912, a request to send the second set of instructions to companion device 981 that is associated with computing device 912.
- Computing system 900 may then send the second set of instructions to companion device 981 to display the new GUI component and functionality to another user via UI components 982.
- users may share GUIs and widgets, such as widgets 983, 984, 985, and/or other GUIs and widgets described herein , and new application functionality with each other.
- a first user operating computing device 912 may send a widget to a second user operating companion device 981 via, for example, Short Message Service (SMS).
- SMS Short Message Service
- the first user may copy and paste widget 984 into a text message that is then sent to the second user, in which the second user may copy and paste widget 984 onto a home screen of companion device 981.
- computing device 912 may retrieve, using an API, a first set of instructions associated with a first plurality of functions included in an application executing at computing device 912, such as a banking application represented by widget 915G.
- Computing device 912 may then receive, from a user operating computing device 912 and via UI components 902, an indication of a natural language user input associated with one or more functions from the first plurality of functions included in the application.
- Computing device 912 may then apply, using the first set of instructions, a machine learning model, such as a large language model, to the indication of the natural language user input to generate a second set of instructions associated with one or more functions from a second plurality of functions (e.g., a user’s desired functions), wherein the second set of instructions includes instructions for generating a GUI component (e.g., widget 983) that may provide the user’s desired functionality.
- a GUI component e.g., widget 983
- computing device 912 may generate a second set of instructions (e.g., instructions file 350 of FIG. 3C) including instructions for generating a user’s desired application functionality and/or an associated GUI component on GUI 916.
- FIG. 10 is a flowchart illustrating an example operation for dynamically generating custom graphical user interfaces for one or more applications, in accordance with one or more techniques of this disclosure. The example of FIG. 10 is described with respect to FIGS. 1-8.
- Computing system 100 retrieves information associated with a plurality of functions included in one or more applications (1086). In some examples, computing system 100 also retrieves one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with one or more graphical user interfaces. In some examples, computing system 100 stores the retrieved information in instructions storage 222. Computing system 100 receives an indication of a natural language user input associated with the plurality of functions included in the one or more applications (1087). In some examples, the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display.
- computing system 100 applies speech-to-text module 226 to convert audio data indicative of the natural language user input to text data.
- computing system 100 applies machine learning module 310 to the indication of the natural language input or the text data to identify one or more tasks, in which each task from the one or more tasks is associated with a respective category from one or more categories (1088).
- computing system 100 applies machine learning module 310 to the indication of the natural language input or the text data to identify the one or more categories.
- computing system 100 applies language model module 342 including a large language model to the indication of the natural language input or the text data to identify the one or more tasks and/or the one or more categories.
- Computing system 100 applies, using the information associated with the plurality of functions and/or other information stored in instructions storage 222, machine learning module 310 to the one or more tasks to generate instructions file 350 including a set of instructions (1089).
- the set of instructions provides at least one function for performing a respective task from the one or more tasks.
- instructions file 350 includes instructions for generating at least one GUI associated with the respective category, in which the at least one GUI associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task.
- instructions file 350 may include instructions for generating GUI 417 associated with a “Family” category (demonstrated by “FAMILY” header 451), and GUI 766 associated with a “Trip” category (demonstrated by “TRIP” header 765).
- GUI 417 may include widget 452 associated with at least one function for performing an identified task, and widget 455 associated with at least one function for performing another identified task, in which both tasks are associated with the “Family” category.
- GUI 766 may include widget 767A associated with at least one function for performing an identified task associated with the “Trip” category.
- computing system 100 receives one or more of an additional indication of a user input and context information from the one or more applications, and updates, based on one or more of the additional indication of a user input and the context information, the at least one GUI.
- computing system 100 may receive additional indication of a user input 664 that includes the query, “How many days until the kids start school?”
- Computing system 100 may also receive context information, e.g., from a calendar application. Based on additional indication of a user input 664 and the context information, computing system 100 may update GUI 617 to include widget 663 that is associated with functionality for counting the days until the user’s children start the next school year.
- a GUI associated with a respective category includes one or more of at least one graphical component including text data associated with the respective category, at least one graphical component including text data associated with information from the one or more applications, at least one graphical component associated with one or more suggested inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task.
- GUI 866 may include text header 865 “TRIP” associated with the “Trip” category, and GUI 866 may further include text summary 875 associated with information from the one or more applications, in which text summary 875 provides a short summary of the subtasks and information relevant to a task of, e.g., planning a trip.
- GUI 766 may include sub-widgets 781, 782, and 783, which may each be associated with one or more suggested inputs.
- GUI 866 may include suggested widget 880, which may be associated with at least one function for performing a task or subtask such as, e.g., booking a dinner reservation for the trip.
- the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of the indication of the natural language user input, historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface.
- text header 865 “TRIP” associated with the “Trip” category may be based on an indication of a natural language user input such as, “Plan the trip with John.”
- sub-widgets 781, 782, and 783 which may each be associated with one or more suggested inputs, may be based on historical natural language user inputs, context information from the one or more applications, user data, and/or information associated with GUI 766 that indicates, e.g., a user’s preferences.
- text summary 875 may be based on, e.g., context information retrieved from a messaging application that indicates John booked airline tickets, and information associated with GUI 866, such as widget 878 that indicates the trip accommodation has been booked.
- the at least one GUI associated with the respective category includes the at least one graphical component associated with the one or more suggested inputs.
- computing system 100 updates the at least one graphical component associated with the at least one function for performing the respective task.
- GUI 417 may include widget 455 including buttons 456 that each correspond to a suggested pediatrician. Responsive to a user selecting a button 456 that corresponds to “Pediatrician A,” computing system 100 updates widget 455 associated with the at least one function for performing the task of booking Jane’s appointment at pediatrician A, in which widget 455 may then be updated or replaced as widget 558 or widget 661.
- the at least one GUI associated with the respective category includes the at least one suggested graphical component, in which the at least one suggested graphical component is based on one or more of at least one keyword and at least one user- configurable control.
- GUI 766 associated with the “Trip” category may include sub-widgets 781, 782, and 783, which may each be associated with one or more suggested inputs, and may be based on one or more of keywords 773, 785, and 784, draggable circles 774 each corresponding to a respective keyword, draggable circle 771, and slider 769.
- computing system 100 receives an indication of a user input associated with one or more of the at least one keyword and at least one user-configurable control, and updates, based on the indication of the user input, the at least one suggested graphical component. For example, computing system 100 may receive an indication that a user interacted with one of draggable circles 774 to assign a higher level of importance to “Quiet” keyword 773, in which computing system 100 may update suggested sub-widgets 781, 782, and 783 to include one or more updated suggestions that are determined by computing system 100 to be more associated with “Quiet” keyword 773.
- the one or more applications are one or more applications executing at computing device 112, in which computing system 100 sends, to computing device 112, instructions file 350.
- computing system 100 receives, from computing device 112, a request to send instructions file 350 to a companion device associated with computing device 112, and computing system 100 sends, to the companion device, instructions file 350.
- the at least one graphical component includes a first graphical component and a second graphical component, in which the first graphical component is associated with a first function for performing the respective task, and the second graphical component is associated with a second function for performing the respective task.
- widget 767A may be associated with at least one function for performing a task of booking an accommodation.
- Widget 767A may further include sub-widget 768 which may be associated with a function for setting a budget for booking the accommodation, and subwidget 770 which may be associated with a function for setting a preferred location for booking the accommodation.
- FIG. 11 is a flowchart illustrating another example operation for dynamically generating custom graphical user interfaces for one or more applications, in accordance with one or more techniques of this disclosure. The example of FIG. 11 is described with respect to FIGS. 1-9.
- Computing system 900 retrieves, using API module 906, information associated with a plurality of functions included in one or more applications (1190).
- the one or more applications are executing at computing device 912, such as an application associated with widget 915G.
- Computing system 900 receives an indication of a natural language user input associated with the plurality of functions included in the one or more applications (1191).
- the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display, such as GUI 916, that corresponds to a graphical component, such as widget 915G, associated with one of the one or more applications.
- computing system 900 may generate at least one graphical component, such as GUI component 993, associated with one or more suggested natural language user inputs.
- the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
- Computing system 900 applies, using the information associated with the plurality of functions included in the one or more applications, machine learning module 910 to the indication of the natural language user input to generate instructions file 350, in which instructions file 350 includes instructions for generating at least one graphical component, such as widget 983 (1192).
- language model module 342 which includes a large language model, is applied to the indication of the natural language user input.
- computing system 900 sends instructions file 350 to computing device 912.
- the at least one graphical component, such as widget 983 is associated with at least one function for performing a task.
- computing system 900 may receive, from computing device 912, a request to send instructions file 350 to companion device 981 that is associated with computing device 912, and then send, to companion device 981, instructions file 350.
- computing system 900 is configured to update instructions file 350 responsive to receiving an updated natural language user input.
- computing system 900 may receive the updated natural language user input, and apply machine learning module 910 to the updated natural language user input to update instructions file 350.
- Instructions file 350 may then include instructions for generating at least one updated graphical component.
- Computing system 900 may send, to computing device 912, updated instructions file 350.
- Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol.
- computer-readable media generally may correspond to (1) tangible computer-readable storage media, which is non-transitory or (2) a communication medium such as a signal or carrier wave.
- Data storage media may be any available media that may be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
- a computer program product may include a computer-readable medium.
- such computer-readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other storage medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection is properly termed a computer- readable medium.
- coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
- DSL digital subscriber line
- computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transient, tangible storage media.
- Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- processors may refer to any of the foregoing structures or any other structure suitable for implementation of the techniques described herein.
- the functionality described herein may be provided within dedicated hardware and/or software modules. Also, the techniques could be fully implemented in one or more circuits or logic elements.
- the techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set).
- IC integrated circuit
- a set of ICs e.g., a chip set.
- Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, various units may be combined in a hardware unit or provided by a collection of intraoperative hardware units, including one or more processors, in conjunction with suitable software and/or firmware.
- a computer-readable storage medium comprises a non-transitory medium.
- the term “non-transitory” indicates that the storage medium is not embodied in a carrier wave or a propagated signal.
- a non-transitory storage medium may store data that can, over time, change (e.g., in RAM or cache).
- Example 1 A method includes retrieving, by a computing system, information associated with a plurality of functions included in one or more applications; receiving, by the computing system, an indication of a natural language user input associated with the plurality of functions included in the one or more applications; and applying, by the computing system, and using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
- Example 2 The method of example 1, wherein the one or more applications are one or more applications executing at a computing device.
- Example 3 The method of example 2, wherein the method further includes sending, by the computing system and to the computing device, the set of instructions.
- Example 4 The method of example 3, wherein the method further includes receiving, by the computing system and from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and sending, by the computing system and to the companion device, the set of instructions.
- Example 5 The method of any of examples 1-4, wherein the machine learning model is a large language model.
- Example 6 The method of any of examples 1-5, wherein the at least one graphical component is associated with at least one function for performing a task.
- Example 7 The method of any of examples 1-6, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presencesensitive display that corresponds to a graphical component associated with an application from the one or more applications.
- Example 8 The method of example 7, wherein the method further includes generating, by the computing system, at least one graphical component associated with one or more suggested natural language user inputs.
- Example 9 The method of example 8, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
- Example 10 The method of any of examples 1-9, wherein the method further includes updating, by the computing system, the set of instructions responsive to receiving an updated natural language user input.
- Example 11 The method of example 10, wherein further includes receiving, by the computing system, the updated natural language user input; and applying, by the computing system, the machine learning model to the updated natural language user input to update the set of instructions, wherein the set of instructions includes instructions for generating at least one updated graphical component.
- Example 12 A computing system comprising: one or more processors; and one or more storage devices that store instructions, wherein the instructions, when executed by the one or more processors, cause the one or more processors to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; and apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
- Example 13 The computing system of example 12, wherein the one or more applications are one or more applications executing at a computing device.
- Example 14 The computing system of example 13, wherein the instructions further cause the one or more processors to: send, to the computing device, the set of instructions.
- Example 15 The computing system of example 14, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and send, to the companion device, the set of instructions.
- Example 16 The computing system of any of examples 12-15, wherein the machine learning model is a large language model.
- Example 17 The computing system of any of examples 12-16, wherein the at least one graphical component is associated with at least one function for performing a task.
- Example 18 The computing system of any of examples 12-17, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical component associated with an application from the one or more applications.
- Example 19 The computing system of example 18, wherein the instructions further cause the one or more processors to: generate at least one graphical component associated with one or more suggested natural language user inputs.
- Example 20 The computing system of example 19, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
- Example 21 The computing system of any of examples 12-20, wherein the instructions further cause the one or more processors to: update the set of instructions responsive to receiving an updated natural language user input.
- Example 22 The computing system of example 21, wherein the instructions further cause the one or more processors to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the set of instructions, wherein the set of instructions includes instructions for generating at least one updated graphical component.
- Example 23 A non-transitory computer-readable storage medium encoded with instructions that, when executed by one or more processors of a computing device, cause the one or more processors to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; and apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
- Example 24 The non-transitory computer-readable medium of example 23, wherein the one or more applications are one or more applications executing at a computing device.
- Example 25 The non-transitory computer-readable medium of example 24, wherein the instructions further cause the one or more processors to: send, to the computing device, the set of instructions.
- Example 26 The non-transitory computer-readable medium of example 25, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and send, to the companion device, the set of instructions.
- Example 27 The non-transitory computer-readable medium of any of examples 23-
- Example 28 The non-transitory computer-readable medium of any of examples 23-
- the at least one graphical component is associated with at least one function for performing a task.
- Example 29 The non-transitory computer-readable medium of any of examples 23-
- Example 30 The non-transitory computer-readable medium of example 29, wherein the instructions further cause the one or more processors to: generate at least one graphical component associated with one or more suggested natural language user inputs.
- Example 31 The non-transitory computer-readable medium of example 30, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
- Example 32 The non-transitory computer-readable medium of any of examples 23- 31, wherein the instructions further cause the one or more processors to: update the set of instructions responsive to receiving an updated natural language user input.
- Example 33 The non-transitory computer-readable medium of example 32, wherein the instructions further cause the one or more processors to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the set of instructions, wherein the set of instructions includes instructions for generating at least one updated graphical component.
- Example 34 A computer program product for generating custom graphical components for one or more applications, the computer program product comprising one or more instructions that, when executed by at least one processor, cause the at least one processor to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; and apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
- Example 35 The computer program product of example 34, wherein the one or more applications are one or more applications executing at a computing device.
- Example 36 The computer program product of example 35, wherein the one or more instructions further cause the at least one processor to: send, to the computing device, the set of instructions.
- Example 37 The computer program product of example 36, wherein the one or more instructions further cause the at least one processor to: receive, from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and send, to the companion device, the set of instructions.
- Example 38 The computer program product of any of examples 34-37, wherein the machine learning model is a large language model.
- Example 39 The computer program product of any of examples 34-38, wherein the at least one graphical component is associated with at least one function for performing a task.
- Example 40 The computer program product of any of examples 34-39, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical component associated with an application from the one or more applications.
- Example 41 The computer program product of example 40, wherein the one or more instructions further cause the at least one processor to: generate at least one graphical component associated with one or more suggested natural language user inputs.
- Example 42 The computer program product of example 41, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
- Example 43 The computer program product of any of examples 34-42, wherein the one or more instructions further cause the at least one processor to: update the set of instructions responsive to receiving an updated natural language user input.
- Example 44 The computer program product of example 43, wherein the one or more instructions further cause the at least one processor to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the set of instructions, wherein the set of instructions includes instructions for generating at least one updated graphical component.
- Example 45 A method includes retrieving, by a computing system, information associated with a plurality of functions included in one or more applications; receiving, by the computing system, an indication of a natural language user input associated with the plurality of functions included in the one or more applications; applying, by the computing system, a machine learning model to the indication of the natural language user input to identify one or more tasks, wherein each task from the one or more tasks is associated with a respective category from one or more categories; and applying, by the computing system, and using the instructions , the machine learning model to the one or more tasks to generate a set of instructions, wherein the set of instructions is associated with at least one function for performing a respective task from the one or more tasks, wherein the set of instructions includes instructions for generating at least one graphical user interface associated with the respective category, and wherein the at least one graphical user interface associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task.
- Example 46 The method of example 45, wherein the method further includes applying, by the computing system, the machine learning model to the indication of the natural language user input to identify the one or more categories.
- Example 47 The method of any of examples 45 and 46, wherein the one or more applications are one or more applications executing at a computing device, wherein the method further includes sending, by the computing system and to the computing device, the set of instructions.
- Example 48 The method of example 47, wherein the method further includes receiving, by the computing system and from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and sending, by the computing system and to the companion device, the set of instructions.
- Example 49 The method of any of examples 45 through 48, wherein the at least one graphical user interface associated with the respective category includes one or more of: at least one graphical component including text data associated with the respective category; at least one graphical component including text data associated with information from the one or more applications; at least one graphical component associated with one or more suggested inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task.
- Example 50 The method of example 49, wherein the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface.
- Example 51 The method of any of examples 49 and 50, wherein the at least one graphical user interface associated with the respective category includes the at least one graphical component associated with the one or more suggested inputs, wherein the method further includes responsive to receiving input indicative of a selection of a suggested input from the one or more suggested inputs, updating, by the computing system, the at least one graphical component associated with the at least one function for performing the respective task.
- Example 52 The method of any of examples 49 through 51, wherein the at least one graphical user interface associated with the respective category includes the at least one suggested graphical component, and wherein the at least one suggested graphical component is based on one or more of at least one keyword and at least one user-configurable control.
- Example 53 The method of example 52, wherein the method further includes receiving, by the computing system, an indication of a user input associated with one or more of the at least one keyword and at least one user-configurable control; and updating, by the computing system and based on the indication of the user input, the at least one suggested graphical component.
- Example 54 The method of any of examples 45 through 53, wherein the at least one graphical component includes a first graphical component and a second graphical component, wherein the first graphical component is associated with a first function for performing the respective task, and wherein the second graphical component is associated with a second function for performing the respective task.
- Example 55 The method of any of examples 45 through 54, wherein the method further includes receiving, by the computing system, one or more of an additional indication of a user input and context information from the one or more applications; and updating, based on one or more of the additional indication of a user input and the context information, the at least one graphical user interface.
- Example 56 The method of any of examples 45 through 55, wherein the machine learning model is a large language model.
- a computing system includes one or more processors; and one or more storage devices that store instructions, wherein the instructions, when executed by the one or more processors, cause the one or more processors to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; apply a machine learning model to the indication of the natural language user input to identify one or more tasks, wherein each task from the one or more tasks is associated with a respective category from one or more categories; and apply, using the information associated with the plurality of functions, the machine learning model to the one or more tasks to generate a set of instructions, wherein the set of instructions is associated with at least one function for performing a respective task from the one or more tasks, wherein the set of instructions includes instructions for generating at least one graphical user interface associated with the respective category, and wherein the at least one graphical user interface associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective
- Example 58 The computing system of example 57, wherein the instructions further cause the one or more processors to: apply the machine learning model to the indication of the natural language user input to identify the one or more categories.
- Example 59 The computing system of any of examples 57 and 58, wherein the one or more applications are one or more applications executing at a computing device, and wherein the instructions further cause the one or more processors to: send, to the computing device, the set of instructions.
- Example 60 The computing system of example 59, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and send, to the companion device, the set of instructions.
- Example 61 The computing system of any of examples 57 through 60, wherein the at least one graphical user interface associated with the respective category includes one or more of: at least one graphical component including text data associated with the respective category; at least one graphical component including text data associated with information from the one or more applications; at least one graphical component associated with one or more suggested inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task.
- Example 62 The computing system of example 61, wherein the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface.
- Example 63 The computing system of any of examples 61 and 62, wherein the at least one graphical user interface associated with the respective category includes the at least one graphical component associated with the one or more suggested inputs, wherein the instructions further cause the one or more processors to: responsive to receiving input indicative of a selection of a suggested input from the one or more suggested inputs, update the at least one graphical component associated with the at least one function for performing the respective task.
- Example 64 The computing system of any of examples 61 through 63, wherein the at least one graphical user interface associated with the respective category includes the at least one suggested graphical component, and wherein the at least one suggested graphical component is based on one or more of at least one keyword and at least one user-configurable control.
- Example 65 The computing system of example 64, wherein the instructions further cause the one or more processors to: receive an indication of a user input associated with one or more of the at least one keyword and at least one user-configurable control; and update, based on the indication of the user input, the at least one suggested graphical component.
- Example 66 The computing system of any of examples 57 through 65, wherein the at least one graphical component includes a first graphical component and a second graphical component, wherein the first graphical component is associated with a first function for performing the respective task, and wherein the second graphical component is associated with a second function for performing the respective task.
- Example 67 The computing system of any of examples 57 through 66, wherein the instructions further cause the one or more processors to: receive one or more of an additional indication of a user input and context information from the one or more applications; and update, based on one or more of the additional indication of a user input and the context information, the at least one graphical user interface.
- Example 68 The computing system of any of examples 57 through 67, wherein the machine learning model is a large language model.
- Example 69 The computing system of any of examples 57 through 68, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display.
- Example 70 A non-transitory computer-readable storage medium encoded with instructions that, when executed by one or more processors, cause one or more processors to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; apply a machine learning model to the indication of the natural language user input to identify one or more tasks, wherein each task from the one or more tasks is associated with a respective category from one or more categories; and apply, using the information associated with the plurality of functions, the machine learning model to the one or more tasks to generate a set of instructions, wherein the set of instructions is associated with at least one function for performing a respective task from the one or more tasks, wherein the set of instructions includes instructions for generating at least one graphical user interface associated with the respective category, and wherein the at least one graphical user interface associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task.
- Example 71 The non-transitory computer-readable storage medium of example 70, wherein the instructions further cause the one or more processors to: apply the machine learning model to the indication of the natural language user input to identify the one or more categories.
- Example 72 The non-transitory computer-readable storage medium of any of examples 70 and 71, wherein the one or more applications are one or more applications executing at a computing device, and wherein the instructions further cause the one or more processors to: send, to the computing device, the set of instructions.
- Example 73 The non-transitory computer-readable storage medium of example 72, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and send, to the companion device, the set of instructions.
- Example 74 The non-transitory computer-readable storage medium of any of examples 70 through 73, wherein the at least one graphical user interface associated with the respective category includes one or more of: at least one graphical component including text data associated with the respective category; at least one graphical component including text data associated with information from the one or more applications; at least one graphical component associated with one or more suggested inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task.
- Example 75 The non-transitory computer-readable storage medium of example 74, wherein the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface.
- Example 76 The non-transitory computer-readable storage medium of any of examples 74 and 75, wherein the at least one graphical user interface associated with the respective category includes the at least one graphical component associated with the one or more suggested inputs, wherein the instructions further cause the one or more processors to: responsive to receiving input indicative of a selection of a suggested input from the one or more suggested inputs, update the at least one graphical component associated with the at least one function for performing the respective task.
- Example 77 The non-transitory computer-readable storage medium of any of examples 74 through 76, wherein the at least one graphical user interface associated with the respective category includes the at least one suggested graphical component, and wherein the at least one suggested graphical component is based on one or more of at least one keyword and at least one user-configurable control.
- Example 78 The non-transitory computer-readable storage medium of example 77, wherein the instructions further cause the one or more processors to: receive an indication of a user input associated with one or more of the at least one keyword and at least one user- configurable control; and update, based on the indication of the user input, the at least one suggested graphical component.
- Example 79 The non-transitory computer-readable storage medium of any of examples 70 through 78, wherein the at least one graphical component includes a first graphical component and a second graphical component, wherein the first graphical component is associated with a first function for performing the respective task, and wherein the second graphical component is associated with a second function for performing the respective task.
- Example 80 The non-transitory computer-readable storage medium of any of examples 70 through 79, wherein the instructions further cause the one or more processors to: receive one or more of an additional indication of a user input and context information from the one or more applications; and update, based on one or more of the additional indication of a user input and the context information, the at least one graphical user interface.
- Example 81 The non-transitory computer-readable storage medium of any of examples 70 through 80, wherein the machine learning model is a large language model.
- Example 82 The non-transitory computer-readable storage medium of any of examples 70 through 81, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display.
- Example 83 A computer program product for generating custom user interfaces and functionality for performing tasks, the computer program product comprising one or more instructions that, when executed by at least one processor, cause the at least one processor to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; apply a machine learning model to the indication of the natural language user input to identify one or more tasks, wherein each task from the one or more tasks is associated with a respective category from one or more categories; and apply, using the information associated with the plurality of functions, the machine learning model to the one or more tasks to generate a set of one or more instructions, wherein the set of instructions is associated with at least one function for performing a respective task from the one or more tasks, wherein the set of one or more instructions includes one or more instructions for generating at least one graphical user interface associated with the respective category, and wherein the at least one graphical user interface associated with the respective category includes at least one graphical component
- Example 84 The computer program product of example 83, wherein the one or more instructions further cause the at least one processor to: apply the machine learning model to the indication of the natural language user input to identify the one or more categories.
- Example 85 The computer program product of any of examples 83 and 84, wherein the one or more applications are one or more applications executing at a computing device, and wherein the one or more instructions further cause the at least one processor to: [0332] send, to the computing device, the set of one or more instructions.
- Example 86 The computer program product of example 85, wherein the one or more instructions further cause the at least one processor to: receive, from the computing device, a request to send the set of one or more instructions to a companion device associated with the computing device; and send, to the companion device, the set of one or more instructions.
- Example 87 The computer program product of any of examples 83 through 86, wherein the at least one graphical user interface associated with the respective category includes one or more of: at least one graphical component including text data associated with the respective category; at least one graphical component including text data associated with information from the one or more applications; at least one graphical component associated with one or more suggested inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task.
- Example 88 The computer program product of example 87, wherein the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface.
- Example 89 The computer program product of any of examples 87 and 88, wherein the at least one graphical user interface associated with the respective category includes the at least one graphical component associated with the one or more suggested inputs, wherein the one or more instructions further cause the at least one processor to: responsive to receiving input indicative of a selection of a suggested input from the one or more suggested inputs, update the at least one graphical component associated with the at least one function for performing the respective task.
- Example 90 The computer program product of any of examples 87 through 89, wherein the at least one graphical user interface associated with the respective category includes the at least one suggested graphical component, and wherein the at least one suggested graphical component is based on one or more of at least one keyword and at least one user-configurable control.
- Example 91 The computer program product of example 90, wherein the one or more instructions further cause the at least one processor to: receive an indication of a user input associated with one or more of the at least one keyword and at least one user- configurable control; and update, based on the indication of the user input, the at least one suggested graphical component.
- Example 92 The computer program product of any of examples 83 through 91, wherein the at least one graphical component includes a first graphical component and a second graphical component, wherein the first graphical component is associated with a first function for performing the respective task, and wherein the second graphical component is associated with a second function for performing the respective task.
- Example 93 The computer program product of any of examples 83 through 92, wherein the one or more instructions further cause the at least one processor to: receive one or more of an additional indication of a user input and context information from the one or more applications; and update, based on one or more of the additional indication of a user input and the context information, the at least one graphical user interface.
- Example 94 The computer program product of any of examples 83 through 94, wherein the machine learning model is a large language model.
- Example 95 The computer program product of any of examples 83 through 94, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display.
- Example 96 A computing system comprising means for performing any combination of the methods of examples 45-56.
- Example 97 A computer-readable storage medium encoded with instructions for performing any combination of the methods of examples 45-56.
- Example 98 A method includes retrieving, by a computing system, a first set of instructions associated with a first plurality of functions included in an application; receiving, by the computing system, an indication of a natural language user input associated with one or more functions from the first plurality of functions included in the application; and applying, by the computing system, and using the first set of instructions, a machine learning model to the indication of the natural language user input to generate a second set of instructions associated with one or more functions from a second plurality of functions included in the application, wherein the second set of instructions includes instructions for generating a first graphical user interface.
- Example 99 The method of example 98, wherein the application is an application executing at a computing device.
- Example 100 The method of example 99, further comprising sending, by the computing system and to the computing device, the second set of instructions.
- Example 101 The method of example 100, further comprising: receiving, by the computing system and from the computing device, a request to send the second set of instructions to a companion device associated with the computing device; and sending, by the computing system and to the companion device, the second set of instructions.
- Example 102 The method of any of examples 98 through 101, wherein the machine learning model is a large language model.
- Example 103 The method of any of examples 98 through 102, wherein the first graphical user interface includes at least one graphical component associated with the one or more functions from the second plurality of functions included in the application.
- Example 104 The method of any of examples 98 through 103, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical user interface component associated with the application.
- Example 105 The method of example 104, further comprising: generating, by the computing system, a second graphical user interface, wherein the second graphical user interface includes at least one graphical component associated with one or more suggested natural language user inputs.
- Example 106 The method of example 105, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
- Example 107 The method of any of examples 98-106, wherein the computing system is configured to update the second set of instructions responsive to receiving an updated natural language user input.
- Example 108 The method of example 107, further comprising: receiving, by the computing system, the updated natural language user input; and applying, by the computing system, the machine learning model to the updated natural language user input to update the second set of instructions, wherein the second set of instructions includes instructions for generating an updated graphical user interface.
- a computing system includes: one or more processors; and one or more storage devices that store instructions, wherein the instructions, when executed by the one or more processors, cause the one or more processors to: retrieve a first set of instructions associated with a first plurality of functions included in an application; receive an indication of a natural language user input associated with one or more functions from the first plurality of functions included in the application; and apply, using the first set of instructions, a machine learning model to the indication of the natural language user input to generate a second set of instructions associated with one or more functions from a second plurality of functions included in the application, wherein the second set of instructions includes instructions for generating a first graphical user interface.
- Example 110 The computing system of example 109, wherein the application is an application executing at a computing device.
- Example 111 The computing system of example 110, wherein the instructions further cause the one or more processors to send, to the computing device, the second set of instructions.
- Example 112 The computing system of example 111, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the second set of instructions to a companion device associated with the computing device; and send, to the companion device, the second set of instructions.
- Example 113 The computing system of any of examples 109 through 112, wherein the machine learning model is a large language model.
- Example 114 The computing system of any of examples 109 through 113, wherein the first graphical user interface includes at least one graphical component associated with the one or more functions from the second plurality of functions included in the application.
- Example 115 The computing system of any of examples 109 through 114, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical user interface component associated with the application.
- Example 116 The computing system of example 115, wherein the instructions further cause the one or more processors to: generate a second graphical user interface, wherein the second graphical user interface includes at least one graphical component associated with one or more suggested natural language user inputs.
- Example 117 The computing system of example 116, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
- Example 119 The computing system of any of examples 109 through 117, wherein the instructions further cause the one or more processors to update the second set of instructions responsive to receiving an updated natural language user input.
- Example 120 The computing system of example 119, wherein the instructions further cause the one or more processors to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the second set of instructions, wherein the second set of instructions includes instructions for generating an updated graphical user interface.
- Example 121 A non-transitory computer-readable storage medium encoded with instructions that, when executed by one or more processors, cause one or more processors to: retrieve a first set of instructions associated with a first plurality of functions included in an application; receive an indication of a natural language user input associated with one or more functions from the first plurality of functions included in the application; and apply, using the first set of instructions, a machine learning model to the indication of the natural language user input to generate a second set of instructions associated with one or more functions from a second plurality of functions included in the application, wherein the second set of instructions includes instructions for generating a first graphical user interface.
- Example 122 The non-transitory computer-readable medium of example 121, wherein the application is an application executing at a computing device.
- Example 123 The non-transitory computer-readable medium of example 122, wherein the instructions further cause the one or more processors to send, to the computing device, the second set of instructions.
- Example 124 The non-transitory computer-readable medium of example 123, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the second set of instructions to a companion device associated with the computing device; and send, to the companion device, the second set of instructions.
- Example 125 The non-transitory computer-readable medium of any of examples 121 through 124, wherein the machine learning model is a large language model.
- Example 126 The non-transitory computer-readable medium of any of examples 121 through 125, wherein the first graphical user interface includes at least one graphical component associated with the one or more functions from the second plurality of functions included in the application.
- Example 127 The non-transitory computer-readable medium of any of examples 121 through 126, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical user interface component associated with the application.
- Example 128 The non-transitory computer-readable medium of example 127, wherein the instructions further cause the one or more processors to: generate a second graphical user interface, wherein the second graphical user interface includes at least one graphical component associated with one or more suggested natural language user inputs.
- Example 129 The non-transitory computer-readable medium of example 128, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
- Example 130 The non-transitory computer-readable medium of any of examples 121 through 129, wherein the instructions further cause the one or more processors to update the second set of instructions responsive to receiving an updated natural language user input.
- Example 131 The non-transitory computer-readable medium of example 130, wherein the instructions further cause the one or more processors to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the second set of instructions, wherein the second set of instructions includes instructions for generating an updated graphical user interface.
- Example 132 A computer program product for generating custom graphical components for one or more applications, the computer program product comprising one or more instructions that, when executed by at least one processor, cause the at least one processor to: retrieve a first set of instructions associated with a first plurality of functions included in an application; receive an indication of a natural language user input associated with one or more functions from the first plurality of functions included in the application; and apply, using the first set of instructions, a machine learning model to the indication of the natural language user input to generate a second set of instructions associated with one or more functions from a second plurality of functions included in the application, wherein the second set of instructions includes instructions for generating a first graphical user interface.
- Example 133 The computer program product of example 132, wherein the application is an application executing at a computing device.
- Example 134 The computer program product of example 133, wherein the one or more instructions further cause the at least one processor to send, to the computing device, the second set of instructions.
- Example 135 The computer program product of example 134, wherein the one or more instructions further cause the at least one processor to: receive, from the computing device, a request to send the second set of instructions to a companion device associated with the computing device; and send, to the companion device, the second set of instructions.
- Example 136 The computer program product of any of examples 132 through 37, wherein the machine learning model is a large language model.
- Example 137 The computer program product of any of examples 132 through 136, wherein the first graphical user interface includes at least one graphical component associated with the one or more functions from the second plurality of functions included in the application.
- Example 138 The computer program product of any of examples 132 through 137, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical user interface component associated with the application.
- Example 139 The computer program product of example 138, wherein the one or more instructions further cause the at least one processor to: generate a second graphical user interface, wherein the second graphical user interface includes at least one graphical component associated with one or more suggested natural language user inputs.
- Example 140 The computer program product of example 139, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
- Example 141 The computer program product of any of examples 132 through 140, wherein the one or more instructions further cause the at least one processor to update the second set of instructions responsive to receiving an updated natural language user input.
- Example 142 The computer program product of example 141, wherein the one or more instructions further cause the at least one processor to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the second set of instructions, wherein the second set of instructions includes instructions for generating an updated graphical user interface.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
An example computing system may retrieve information associated with a plurality of predefined functions included in one or more applications, and may receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications. The computing system may apply, using the information associated with the plurality of functions, a large language model to the indication of the natural language user input to generate a set of instructions including instructions for generating one or more of at least one graphical user interface and at least one graphical component. In some examples, the computing system may apply the large language model to the indication of the natural language user input to identify one or more tasks. In some examples, the at least one graphical component is associated with at least one function for performing an identified task.
Description
USING LARGE LANGUAGE MODELS TO GENERATE USER INTERFACE COMPONENTS
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims the benefit of US Provisional Patent Application No. 63/586,242, filed September 28, 2023, and US Provisional Patent Application No. 63/697,201, filed September 20, 2024, both of which are incorporated by reference herein in their entirety.
BACKGROUND
[0002] Applications executed on computing devices may provide a wide variety of functionalities to users. However, a user must typically interact with multiple user interface elements and/or screens of an application before they are able to access such functionalities. Users may find it challenging and/or time-consuming to navigate through an entire application, especially when the application layout is unknown or unintuitive. Additionally, applications may not always provide users the ability to create shortcuts that allow them to access functionalities in a timely manner.
SUMMARY
[0003] In general, techniques of this disclosure are directed to techniques for applying a large language model to natural language input in order to dynamically generate custom user interfaces, graphical components, and/or functionality for one or more applications. A remote computing device (e.g., a smartphone) may include one or more applications with a plurality of functions that may be statically defined or predefined at compile time and do not change during execution. A computing system in communication with the computing device may retrieve, with explicit user consent, and using an application programming interface, information associated with the plurality of functions. The computing system may also receive an indication of a natural language user input (e.g., audio or text input from a user operating the remote computing device) associated with the plurality of functions. For example, the computing system may receive an indication of a voice input that includes multiple commands and/or user intents associated with one or more applications, such as, e.g., “Send message to Jenny to arrange childcare, book doctor’s appointment for Jane, schedule the meeting with John, order dinner, and call the electrician.” Using the retrieved information associated with the plurality of functions, the computing system may apply a
machine learning (e.g., a large language model) to the natural language user input to generate a set of instructions, e.g., new code, that provides corresponding user interfaces, graphical components (e.g., widgets) and/or a user’s desired application functionality. For example, in one example, responsive to a user pressing down on a banking application widget with their finger and saying, “Current balance,” the computing system may generate instructions for displaying a new graphical component (e.g., a widget) that includes information pertaining to the user’s current balance. In some examples, the computing system may apply the machine learning model to the natural language user input to identify one or more tasks, in which each task from the one or more tasks is associated with a respective category from one or more categories. For example, the machine learning model may identify “Send message to Jenny to arrange childcare,” and “book doctor’s appointment for Jane” as tasks associated with a “Family” category, and may identify “schedule the meeting with John” as a task associated with a “Work” category. Using the retrieved information associated with the plurality of functions, the computing system may apply the machine learning model to the identified tasks to generate a set of instructions that provides corresponding graphical user interfaces, graphical components, and/or and application functionality for completing the identified tasks.
[0004] In one example, the disclosure is directed toward a method that includes retrieving, by a computing system, information associated with a plurality of functions included in one or more applications, and receiving, by the computing system, an indication of a natural language user input associated with the plurality of functions included in the one or more applications. The method further includes applying, by the computing system, and using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
[0005] In another example, the disclosure is directed toward a computing system comprising one or more processors, and one or more storage devices that store instructions. The instructions, when executed by the one or more processors, cause the one or more processors to retrieve information associated with a plurality of functions included in one or more applications, and receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications. The instructions further cause the one or more processors to apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to
generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
[0006] In another example, the disclosure is directed toward a non-transitory computer- readable storage medium encoded with instructions. The instructions, when executed by one or more processors of a computing device, cause the one or more processors to retrieve information associated with a plurality of functions included in one or more applications, and receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications. The instructions further cause the one or more processors to apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
[0007] In another example, the disclosure is directed toward a computer program product for generating custom graphical components for one or more applications, the computer program product comprising one or more instructions. The one or more instructions, when executed by at least one processor, cause the at least one processor to retrieve information associated with a plurality of functions included in one or more applications, and receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications. The one or more instructions further cause the at least one processor to apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
[0008] The details of one or more examples of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the disclosure will be apparent from the description and drawings, and from the claims.
BRIEF DESCRIPTION OF THE FIGURES
[0009] FIG. 1 is a conceptual diagram illustrating an example computing system for dynamically generating custom graphical user interfaces for performing one or more tasks identified in natural language input, in accordance with one or more techniques of this disclosure.
[0010] FIG. 2 is a block diagram illustrating another example computing system configured to apply a machine learning module to natural language input to dynamically generate custom graphical user interfaces, in accordance with one or more techniques of this disclosure.
[0011] FIG. 3 A is a conceptual diagram illustrating an example training process for a machine learning module, in accordance with one or more techniques of this disclosure. [0012] FIG. 3B is a conceptual diagram illustrating an example trained machine learning module, in accordance with one or more techniques of this disclosure.
[0013] FIG. 3C is a conceptual diagram illustrating a machine learning module configured to apply a large language model that accepts natural language input and provides code for corresponding graphical user interfaces and application functionality as output, in accordance with one or more techniques of this disclosure.
[0014] FIG. 4 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
[0015] FIG. 5 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
[0016] FIG. 6 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
[0017] FIG. 7A is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
[0018] FIG. 7B is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
[0019] FIG. 8 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for
performing tasks associated with a category, in accordance with one or more techniques of this disclosure.
[0020] FIG. 9 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality to a companion device, in accordance with one or more techniques of this disclosure.
[0021] FIG. 10 is a flowchart illustrating an example operation for dynamically generating custom graphical user interfaces for one or more applications, in accordance with one or more techniques of this disclosure.
[0022] FIG. 11 is a flowchart illustrating another example operation for dynamically generating custom graphical user interfaces for one or more applications, in accordance with one or more techniques of this disclosure.
DETAILED DESCRIPTION
[0023] FIG. 1 is a conceptual diagram illustrating an example computing system for dynamically generating custom graphical user interfaces for performing one or more tasks identified in natural language input, in accordance with one or more techniques of this disclosure. In the example of FIG. 1, a user 120 interacts with computing device 112 that is in communication with computing system 100. In some examples, some or all of the components and/or functionality attributed to computing system 100 may be implemented or performed by computing device 112.
[0024] While not explicitly shown in the example of FIG. 1, computing system 100 may be implemented on a plurality of computing devices that may include, but are not limited to, portable, mobile, or other devices, such as mobile phones (including smartphones), laptop computers, desktop computers, tablet computers, smart television platforms, server computers, mainframes, etc. In some examples, computing system 100 may represent a cloud computing system that provides one or more services via network 101. That is, in some examples, computing system 100 may be a distributed computing system.
[0025] Computing system 100 may communicate with computing device 112 via network 101. Network 101 may include any public or private communication network, such as a cellular network, Wi-Fi network, a direct cell-to-satellite communication network, or other type of network for transmitting data between computing system 100 and computing device 112. In some examples, network 101 may represent one or more packet switched networks, such as the Internet. Computing device 112 may send and receive data to and from computing system 100 across network 101 using any suitable communication techniques. For example,
computing system 100 and computing device 112 may each be operatively coupled to network 101 using respective network links. Network 101 may include network hubs, network switches, network routers, etc., that are operatively inter-coupled thereby providing for the exchange of information between computing device 112 and computing system 100. In some examples, network links of network 101 may be Ethernet, ATM or other network connections. Such connections may include wireless and/or wired connections.
[0026] As shown in the example of FIG. 1, computing device 112 includes one or more user interface (UI) components (“UI components 102”). UI components 102 of computing device 112 may be configured to function as input devices and/or output devices for computing device 112. UI components 102 may be implemented using various technologies. For instance, UI components 102 may be configured to receive input from user 120 through tactile, audio, and/or video feedback. Examples of input devices include a presence-sensitive display, a presence-sensitive or touch-sensitive input device (such as that shown in FIG. 1), a mouse, a keyboard, a voice responsive system, video camera, microphone or any other type of device for detecting a command from user 120. In some examples, a presence-sensitive display includes a touch-sensitive or presence-sensitive input screen, such as a resistive touchscreen, a surface acoustic wave touchscreen, a capacitive touchscreen, a projective capacitive touchscreen, a pressure sensitive screen, an acoustic pulse recognition touch screen, or another presence-sensitive technology. That is, UI components 102 of computing device 112 may include a presence-sensitive device that may receive tactile input from user 120. UI components 102 may receive indications of the tactile input by detecting one or more gestures from user 120 (e.g., when user 120 touches or points to one or more locations of UI components 102 with a finger or a stylus pen).
[0027] UI components 102 may additionally or alternatively be configured to function as an output device by providing output to user 120 using tactile, audio, or video stimuli. Examples of output devices include a sound card, a video graphics adapter card, or any of one or more display devices, such as a liquid crystal display (LCD), dot matrix display, light emitting diode (LED) display, microLED, miniLED, organic light-emitting diode (OLED) display, e- ink, or similar monochrome or color display capable of outputting visible information to user 120. Additional examples of an output device include a speaker, a haptic device, or other device that can generate intelligible output to a user. For instance, UI components 102 may present output to user 120 as a graphical user interface that may be associated with functionality provided by computing device 112. In this way, UI components 102 may present various user interfaces of applications executing at or accessible by computing device
112 (e.g., an electronic message application, an Internet browser application, etc.). User 120 may interact with a respective user interface of an application to cause computing device 112 to perform operations relating to a function provided by the application.
[0028] In some examples, UI components 102 of computing device 112 may detect two- dimensional and/or three-dimensional gestures as input from user 120. For instance, a sensor of UI components 102 may detect the user's movement (e.g., moving a hand, an arm, a pen, a stylus, etc.) within a threshold distance of the sensor of UI components 102. UI components 102 may determine a two- or three-dimensional vector representation of the movement and correlate the vector representation to a gesture input (e.g., a hand-wave, a pinch, a clap, a pen stroke, etc.) that has multiple dimensions. In other words, UI components 102 may, in some examples, detect a multidimensional gesture without requiring the user to gesture at or near a screen or surface at which UI components 102 output information for display. Instead, UI components 102 may detect a multi-dimensional gesture performed at or near a sensor which may or may not be located near the screen or surface at which UI components 102 output information for display.
[0029] In the example of FIG. 1, computing system 100 includes user interface (UI) module 104. Module 104 may perform operations described herein using hardware, software, firmware, or a mixture thereof residing in and/or executing at computing system 100. Computing system 100 may execute module 104 with one processor or with multiple processors. In some examples, computing system 100 may execute module 104 as a virtual machine executing on underlying hardware. Module 104 may execute as one or more services of an operating system or computing platform or may execute as one or more executable programs at an application layer of a computing platform.
[0030] UI module 104, as shown in the example of FIG. 1, may be operable by computing system 100 to perform one or more functions, such as receive input and send indications of such input to other components associated with computing system 100. UI module 104 may also receive data from components associated with computing system 100. Using the data received, UI module 104 may cause other components associated with computing system 100, such as UI components 102, to provide output based on the data. For instance, UI module 104 may send data to UI components 102 of computing device 112 to display a graphical user interface (GUI), such as GUI 116.
[0031] In general, user 120 may be provided with an opportunity to provide input to control whether programs or features of computing device 112 and/or computing system 100 can collect and make use of user information (e.g., user 120’s personal data, information about
user 114’s current location, location history, activity, etc.), or to dictate whether and/or how computing device 112 and/or computing system 100 may receive content that may be relevant to user 120. Other user information may include data that includes the context of user usage, either obtained from an application itself or from other sources. Examples of usage context may include breadth of share (sharing publicly, or with a large group, or privately, or a specific person), context of share, etc. When permitted by the user, additional data can include the state of the device, e.g., the location of the device, the apps running on the device, etc. In addition, certain data may be treated in one or more ways before it is stored or used by computing device 112 and/or computing system 100 so that personally identifiable information is removed. For example, a user’s identity may be treated so that no personally identifiable information can be determined about the user, or a user’s geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined. Thus, user 120 may have control over how information is collected about them and used by computing device 112 and/or computing system 100. For example, user 120 may be prompted by computing device 112 to provide explicit consent for computing device 112 and/or computing system 100 to retrieve and/or store any or all of user 120’ s data. In some examples, an action log executed on computing device 112 may provide user 120 a ledger of activity, which may show any automations or applications running in the background of computing device 112, as well as an accurate log of all UI generator module 108 activity.
[0032] In the example of FIG. 1, graphical user interface (GUI) 116 may be an example representation of a mobile phone home screen. GUI 116 may include a plurality of user interface elements. For example, as shown in FIG. 1, GUI 116 includes user interface elements 115A-1151, which may be referred to as “widgets”. A widget may be a smaller GUI or GUI element that provides specific functionality or access to a larger application. For example, GUI 116 includes widgets 115A-1151, which may provide user 120 access to one or more applications. For example, widget 115A may be a widget for a messaging application, in which, responsive to user 120 clicking on widget 115A, computing device 112 may open the messaging application for user 120. Widget 115B, for example, may be a widget for a banking application. Widget 115C, for example, may be a widget for a social media application, and widget 115D may be a widget for an Internet browser. As such, computing device 112 may include one or more applications, which may be accessed via one or more widgets displayed on GUI 116.
[0033] In general, the “plurality of functions” described herein may be functions, or
“functionality”, e.g., capabilities or features of an application, that are provided by the values, settings, or other data that are directly embedded into the source code of an application, rather than those that are dynamically generated or configurable at runtime. The “plurality of functions” may include functionality provided by values, logic, etc. that are fixed, e.g., “hard- coded”, in an application’s source code, and cannot be easily changed without modifying the code itself. As such, the “plurality of functions” may be considered statically defined functions, or functions that are predefined at compile time or build time and do not change during execution. The “information associated with a plurality of functions” described herein may refer to data that can be retrieved, e.g., via an API, from one or more applications installed on a computing device, such as computing device 112. For example, an application may include an API that enables external applications or modules to interact with and use the data stored by the application. As such, the “information associated with a plurality of functions included in one or more applications” may be defined as data associated with the predefined or statically defined functionality of the one or more applications, e.g., an API response. As an example, a banking application may include predefined or statically defined functionality for displaying a current balance of a user’s bank account. API module 106 may use the banking application API to retrieve the information associated with the plurality of functions, which may include, for example, a value for the current balance of the user’s bank account, but may not include all of the predefined or statically defined functionality or logic for determining and displaying the value for the current balance of the user’s bank account. [0034] As such, the one or more applications may be considered to include a plurality of predefined functions. For example, a calculator application may include predefined functionality for performing various arithmetic and mathematical operations, a browser application may include predefined functionality for accessing and browsing the Internet, a banking application may include predefined functionality for transferring funds, etc. As such, many applications executed on computing devices may include predefined functionality for performing various tasks, such as responding to messages, scheduling appointments, booking reservations, browsing the Internet, etc. As an example, if a user wants to book a dinner reservation, they may use a dining application to reserve a table at a particular restaurant. However, the user may also need to use a calendar application to determine what date and time they can book the reservation for, use a map application to find local restaurants, use a web browser application to find reviews for a restaurant, use a messaging application to determine if any friends would like to join the dinner reservation, etc. As such, just to complete a single task, such as booking a dinner reservation, a user may have to navigate
through multiple applications, which may be time-consuming and frustrating for a user. Furthermore, a user may find it difficult to complete tasks due to information being stored across multiple different applications, and due to information having the potential to change over time (e.g., a user may book a reservation at 7:00 PM, but later receive a message from a friend saying that time no longer works with their schedule).
[0035] Therefore, users may benefit from custom user interfaces and widgets that are dynamically generated based on identified tasks, in which the custom user interfaces may be organized into different categories, and the custom widgets may enable users to access desired functionality for performing the identified tasks. For example, rather than user 120 having to navigate through multiple user interfaces in multiple applications to access desired information and/or functionality and complete multiple tasks, user 120 may simply say their intent or command, i.e., provide natural language input 114, and computing system 100 may provide instructions for generating the multiple organized user interfaces with widgets for completing the multiple tasks.
[0036] In accordance with techniques of this disclosure, computing system 100 may include a user interface generator module 108 that applies a large language model to natural language input in order to dynamically generate custom user interfaces and functionality for performing various tasks. Specifically, with explicit consent from user 120, user interface generator module 108 may retrieve, via API module 106, information (e.g., API response data) associated with the plurality of functions included in the one or more applications executing at computing device 112, such as applications associated with and/or accessed via widgets 115A-115I.
[0037] In general, with explicit consent from user 120, user interface generator module 108 may run continuously and be configured to monitor the content of one or more applications and/or user activity. In an example involving one or more applications executing on computing device 112, with explicit consent from user 120, user interface generator module 108 may run continuously in the background of computing device 112 and be configured to monitor the content of one or more applications executing at computing device 112 and/or user activity within computing device 112. In other words, API 106 receives explicit consent from user 120 to gather information from user 120 and one or more applications executing on computing device 112 operated by user 120. In general, user interface generator module 108 may receive an indication of a natural language user input 114 associated with one or more predefined or already available functions included in the one or more applications, again provided that user 120 has given explicit permission for computing system 100 to
monitor/receive user 120’ s data.
[0038] In general, API module 106, which can be considered an API library, may include multiple APIs that can be used to access one or more application APIs. In some examples, API module 106 may provide information about user interface elements, events, and actions to assistive technologies (e.g., screen readers, magnification gestures, switch devices, etc.) provided by computing system 100 or computing device 112. In some examples, API module 106 may be configured to enable the exchanging of data in a standardized format. For example, API module 106 may support REST (Representational State Transfer), which is a widely-used architectural style for building APIs that use HTTP (Hypertext Transfer Protocol) to exchange data between applications.
[0039] In some examples, API module 106 may be configured to generate a stream of accessibility events as the user interacts with computing device 112 and applications executed on computing device 112. In some examples, these events may represent actions and changes in a user interface, such as button presses, text changes, and screen transitions. With explicit consent from user 120, user interface generator module 108 may receive and analyze these events to better understand how user 120 interacts with an application executing on computing device 112.
[0040] API module 106 may be configured to retrieve accessibility actions from applications executed on computing device 112. “Accessibility actions” may refer to different types of inputs that can be detected at a location associated with a UI component 102, such as mechanical inputs (e.g., a clicking of a button, a swiping of a screen, etc.), audio input (e.g., verbal command), or gesture control (e.g., triple tapping on a screen, hand wave, assistive gestures, etc.). As such, accessibility actions may provide users the ability to interact with an application or user interface element in multiple ways according to their needs. In some examples, with explicit consent from user 120, computing system 100 may determine which accessibility actions are frequently performed by user 120 when interacting with a GUI or application such that the new user interface generated by user interface generator module 108 can be better tailored for user 120’s needs. In some examples, the information retrieved by API module 106 from computing device 112 may be stored by computing system 100 to identify potential accessibility issues and/or better understand how user 120 interacts with computing device 112. In some examples, user interface generator module 108 may use information retrieved from computing device 112 to determine the format, size, color scheme, accessibility features, or any other features to include in the set of instructions (e.g., new code) for generating new graphical user interfaces, components, and functionality for
performing tasks. In some examples, user interface generator module 108 may also provide users the ability to configure various accessibility and/or display options according to their needs. For example, user 120 may be able to adjust the user interface elements of a GUI, such as text size, enable color correction, set up magnification gestures, and configure gesturebased navigation.
[0041] In general, user interface generator module 108 may send information (e.g., location information, other contextual information, etc.) to ML module 110 only if computing system 100 receives permission from the user of computing device 112 to send the information. For example, in situations discussed here in which computing system 100 and/or computing device 112 may collect, transmit, or may make use of personal information about a user (e.g., location information, financial information, etc.), the user may be provided with an opportunity to control whether programs or features of computing system 100 can collect user information (e.g., information about a user’s social network, a user’s social actions or activities, a user’s profession, a user’s preferences, or a user’s current location), or to control whether and/or how computing system 100 and/or computing device 112 may store and share user information. In addition, certain data may be treated in one or more ways before it is stored, transmitted, or used so that personally identifiable information is removed. For example, a user’s identity may be treated so that no personally identifiable information can be determined about the user. Thus, the user may have control over how information is collected about the user and stored, transmitted, and/or used in accordance with techniques of this disclosure.
[0042] In general, user interface generator module 108 may receive, from computing device 112, and provided that user 120 has given explicit consent, an indication of a natural language user input 114 (e.g., audio or text input from user 120) associated with the plurality of functions included in the one or more applications. In other words, the indication of a natural language user input may represent user 120’s command or intent, and/or desired functionality for one or more applications. For example, as shown in the example of FIG. 1, natural language user input 114 may include a natural language utterance such as “Send money to Mike, book Jane’s appointment. . .” As such, in some examples, natural language user input 114 may represent user 120’s commands and/or desires for performing one or more tasks, such as transferring funds, booking an appointment, viewing their bank account balance, etc. In some examples, user 120 may provide natural language input that represents any number of commands or intents. That is, user 120 may say aloud any number of tasks in
a single utterance, which may include tasks pertaining to different functionality included in different applications.
[0043] In general, API module 106 may be configured to retrieve information (e.g., data) using one or more application APIs for the applications executing on computing device 112, which user interface generator module 108 may interpret in order to understand the functionality provided by the one or more applications. User interface generator module 108 may further use the retrieved information to contextualize the indication of natural language user input 114 when applying machine learning module 110. As one example, a natural language user input may include a natural language utterance such as, “Send the money to Mike.” In this example, while Mike is explicitly deemed the recipient of the money, the user has not specified an amount of money to send. However, user interface generator module 108 may retrieve, using API module 106, information associated with predefined functions included in, for example, a messaging application and a banking application. User interface generator module 108 may receive, with explicit consent from user 120, data from the applications, such as the content of a message received within the messaging application, and a list of a user’s trusted contacts stored within the banking application. User interface generator module 108 may retrieve, for example, data indicative of a message received from Mike R. that includes the phrase, “Can you send me $20?”, and a username associated with Mike R.’s banking application profile. Therefore, computing system 100 may determine that the input command of “Send the money to Mike” indicates a task of sending $20 to Mike R. using the functionality of the banking application. As such, with explicit user consent, computing system 100 may perform tasks using context information and/or user data sourced from one or more applications included in computing device 112. In this way, in some examples, computing system 100 may interpolate natural language input without having to request that users provide additional input for clarification.
[0044] In some examples, user interface generator module 108 may apply machine learning module 110, which may include a language model configured to perform natural language processing techniques, to the indication of natural language user input 114 to identify one or more tasks. In some examples, a prompt may be provided to machine learning module 110 along with the user input, e.g., a string input such as “Only output in the specified format, no comments or explanations. A user has dictated the following to-do items: [Send money to Mike and book Jane’s appointment. Ring doctor to reschedule appointment. Where is the dinner reservation?] The punctuation is not correct, there might be missing periods between items and some items may have been incorrectly combined. Please correct the punctuation
and split it into separate items. There might be one or a few items. Output as a markdown list, like: [-Task 1; -Task 2; -Task 3], Your response should begin with. . .” For example, in the example of natural language user input 114 including a natural language utterance such as “Send money to Mike and book Jane’s appointment,” machine learning module 110 may parse through the indication of natural language user input 114 to identify a first task, “Send money to Mike,” and a second task, “Book Jane’s appointment.” In general, machine learning module 110 may parse through input including any amount of data, i.e., machine learning module 110 may identify any number of tasks in a single natural language user input 114. The output of machine learning module 110 may be in a structured format or a semistructured format.
[0045] In general, to better organize information and actions for completing various tasks, machine learning module 110 may further identify, for each task, one or more associated categories, e.g., “Headspaces.” Example categories may include, but are not limited to, “Family,” “Banking,” “Food,” “Travel,” “Friends,” “Leisure,” etc. Furthermore, in some examples, the categories may be customized by a user, and/or may be determined from a predefined list of categories. In general, to determine the one or more associated categories, a list of the identified tasks may be provided as input to machine learning module 110 along with a prompt, e.g., a prompt such as “Write your response in the following format: [# Headspace name; - Task name; - Task name; - Task name; # Headspace name; - Task name; - Task name; - Task name;]. Your response should begin with ‘ . . . ’. Here are some tasks: [- Send money to Mike; - Book Jane’s appointment]. Please group these tasks into headspaces. Headspaces should contain at least 2 tasks. When naming headspaces, use a vibe, gen-z style, ideally 1 word, no more than 2-3 words. Aim for the minimum number of headspaces that make sense. No more than 5 headspaces. Note: Some of the tasks will be phrased as a question. This is fine, just leave the name as-is, and treat it as a task to answer that question. The following context information was found in the memory of the device, which may or may not be relevant: [- The user has a husband called Mike; - The user has a daughter aged 3 called Jane].” In this example, based on the prompt, machine learning module 110 may determine the first task and the second task to be associated with a “Family” category, as, based on the context information retrieved from computing device 112, machine learning module 110 may determine Mike and Jane to be family members of user 120. As such, in general, input provided to machine learning module 110 may include contextual data, or be “injected with memory” that provides context for other input data, such as the identified tasks.
[0046] In some examples, the determined categories may be sorted based on a respective level of priority. For example, an additional input may be provided to machine learning module 110, e.g., a prompt such as “Here are the headspaces and tasks that the user created: [# Adulting; - Ring doctor to reschedule appointment.; - Pay the utility bill.; - Check my savings.; # Squad; - Book Jane’s flu shot.; - Order vests for Jane.; - Ask Jenny if she can look after the kids on Tuesday.; # Travel; - Book a train north.; - Where is the hotel located? # Fixlt; - Ring John the plumber.; - Ask Ian to use sharp sand in the mortar.; - Clean rear cassette.]. Please reorder these headspaces and tasks. Start with the most important headspaces first, and within each, the most important tasks first. For context, the user is currently at work. Do not change the names of the headspaces or tasks. The following context information was found in the memory of the device, which may or may not be relevant: [- The user has a daughter aged 3 called Jane.]” In this example, based on the user’s current state, e.g., the user currently being at work, and the context information retrieved from computing device 112, machine learning module 110 may sort the categories from a level of highest priority to lowest priority as follows: “Adulting,” “Squad,” “Fixlt,” “Travel.” In some examples, a Levenshtein distance algorithm may be used by machine learning module 110 to match the sorted categories and their associated tasks with existing identified tasks.
[0047] Using the information associated with the plurality of functions, computing system 100 may apply machine learning module 110 to the one or more identified tasks to generate a set of instructions, e.g., code. In general, machine learning module 110 may generate the set of instructions using a large language model, in which the set of instructions may be generated based on one or more of application functionality, capabilities, and/or attributes included in the information associated with the plurality of functions, contextual information (e.g., user data), and user input received by the computing system. That is, using the information associated with the plurality of functions, a prompt may be generated by machine learning module 110, in which the prompt may specify output format (e.g., javascript code), allowed data types, a UI component library that can be used to build an end result UI, an API library including APIs that can be used to retrieve data at runtime (e.g., predefined APIs or “task APIs” configured to retrieve the information associated with the plurality of functions, APIs for accessing sub-LLMs, sub-prompts for disambiguation steps such as “Which Jenny?”, etc.), user input (e.g., the identified tasks), and context information (e.g., relevant user data) such as “The following things were found in the memory of the device, which may or may not be relevant: [- The user has a calendar event in their diary for a doctor's appointment in 2 days time. Its ID is 'a3be'.; - If presenting a call button, the user might like
to refer to the appointment details when calling the doctor.].” The prompt may include additional instructions. For example, an example prompt may include instructions such as, “The user has made a note of a job they want to do. Your job is to present TaskUIComponent(s) to help them get their task done. You're not in charge of completing the task, you're just presenting UI components that will help them get the task done. The to-do item is: ‘Ring doctor to reschedule appointment.’ . Note: if it's not possible to present any UI, rather than displaying a 'text-output' component, it's better to raise an error. However, if you can fall back to a simpler UI, that's better than an error. The code you write should define an asynchronous function called ‘perform’ with arguments ‘task’, ‘dateLib’. You do not need to use all the arguments. Once complete, return an array of TaskUIComponents. The first thing your code should do is call ‘task.setTitle’ with a title for the task. The title should be no more than 2-3 words long.”
[0048] As such, the set of instructions may be, for example, generated javascript code that returns one or more UI components from the UI component library, in which the UI components may display or use information retrieved by API module 106. In this way, the UI components from the UI component library may be considered custom user interfaces and widgets that are dynamically generated based on identified tasks, in which the custom user interfaces may be organized into different categories, and the custom widgets may enable users to access functionality for performing the identified tasks.
[0049] Furthermore, in general, the set of instructions may be dynamically generated at runtime based on user input and retrieved information, including data associated with the predefined or statically defined functions, capabilities, or features from the one or more applications. That is, the set of instructions may include dynamically generated or configurable functionality that may adapt or change based on input data and/or other conditions at runtime. In some examples, the set of instructions may include combined functionality, e.g., functions from the one or more applications that are combined with other functions from the one or more applications to provide functionality for performing an identified task. As such, the set of instructions may be considered generated code that provides corresponding graphical user interfaces and application functionality based on user input.
[0050] The set of instructions may be associated with or provide at least one function for performing a respective task from the one or more tasks, e.g., the set of instructions may provide a user’s desired functionality for completing the one or more tasks. As such, even if user 120’ s desired functionality for performing tasks is not predefined by a particular
application, i.e., included in the plurality of predefined functions, computing system 100 may generate new code that provides user 120’s desired functionality, so long as the desired functionality is determined to be a possible functionality for the one or more applications (e.g., machine learning module 110 may determine whether the desired functionality is reasonable for the one or more applications). For example, continuing the example above for sending money to Mike, computing system 100 may use data retrieved from the messaging application and the banking application to generate new code that provides functionality for sending $20 to Mike by, for example, user 120 interacting with a single graphical component, such as a button. As such, the set of instructions may further include instructions for generating at least one graphical user interface associated with the respective category, in which the at least one GUI associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task. That is, continuing the example, computing system 100 may generate instructions for generating a GUI associated with a “Family” category, in which the “Family" GUI may include a widget that enables user 120 to send $20 to John through the click of a button. In general, the “category GUIs” described herein may be considered visual spaces each associated with a category from any number of categories (e.g., “Family,” “Work,” “Travel,” etc.), in which each category may be identified by parsing user intent. In some examples, computing system 100 may determine a category from a predetermined list of categories for each identified task.
[0051] In this way, the techniques described herein may also provide users a “shortcut” for performing tasks and accessing various application functionality. Furthermore, as shown in the example of FIG. 1, computing system 100 may be configured to receive an indication of natural language input 114 based on, for example, a “touch and talk” feature, rather than by the user navigating through the one or more applications. More specifically, in some examples, computing system 100 receives the indication of natural language user input 114 from computing device 112 in response to a gesture detected at a location of a presencesensitive display of computing device 112, e.g., a location that corresponds to a graphical user interface component used for causing computing system 100 to perform the techniques described herein. For example, in the example of the touch and talk feature, user 120 may hold down on widget 118 with their finger, in which widget 118 may be a widget designated for triggering the techniques attributed to computing system 100, and may be displayed on a home screen (e.g., GUI 116) of computing device 112. While holding down on widget 118, user 120 may provide natural language input 114 such as, “Send money to mike, book jane’s
appointment,” in which holding down on widget 118 may be a gesture that causes a user interface component 102 (e.g., a microphone) of computing device 112 to capture natural language input 114.
[0052] In some examples, the gesture may be provided mechanically (such as by pressing a button) or by gesture recognition/control (such as triple tapping on a screen). In some examples, the indication of a gesture may be an audible input, whereby the gesture is provided by user 120 via, for example, voice command. In some examples, the indication of the gesture is provided by user 120 by using gesture control, such as by providing the gestures described above (e.g., a hand-wave, a pinch, a clap, a pen stroke, etc.) or by tapping the screen in a certain manner (e.g., triple tapping the screen). Therefore, the techniques described herein may be executed by computing system 100 in response to an indication of a variety of gestures. In this way, a user is not required to perform a certain gesture in order to receive desired functionality for an application, which may cause the application to be much more accessible and user-friendly to the user. While user 120 may be able to perform tasks themselves by interacting with, e.g., widgets 115A-1151 to access one or more applications, computing system 100 may generate instructions for performing the tasks based on user 120 performing a simple gesture, such as holding down on widget 118 and speaking their intent. In this way, users may not be required to navigate through applications to find their desired functionality or perform various tasks. That is, the techniques described herein may provide user 120 with a mechanism to “shortcut” the complexity of performing various actions for various tasks.
[0053] Computing system 100 may send the set of instructions to computing device 112, in which computing device 112 may use the set of instructions to generate the at least one GUI associated with a respective category. For example, computing device 112 may use the set of instructions to generate a “Family” GUI, a “Work” GUI, a “Friends” GUI, etc., in which each GUI further includes at least one graphical component (e.g., a widget) associated with at least one function for performing a respective task. For example, the “Family” GUI may include a widget that enables a user to send $20 to Mike by simply clicking a “Send” button included within the widget. Furthermore, in some examples, the set instructions may include instructions for generating the different GUIs in an order based on a level of importance assigned to each category. For example, historical data (e.g., user data) retrieved from computing device 112 may indicate a level of priority for actions and tasks. As an example, historical data retrieved from computing device 112 may indicate that user 120 frequently sends and receives messages to and from contacts deemed as family members, frequently
performs actions within applications that involve said contacts, etc. As such, computing system 100 may determine a level of priority for each task associated with a respective category, and may determine, e.g., based on the priority levels of the associated tasks, an overall level of importance for the respective category. Continuing the example, computing system 100 may determine the “Family” category to have the highest level of importance. As such, the set instructions may include instructions for generating e.g., the “Family” GUI as a first GUI in an order of GUIs, a “Work” GUI as second GUI in the order of GUIs, etc. For example, GUI 116 may be an example mobile phone home screen, and user 120 may swipe horizontally across GUI 116 to view the “Family” GUI, may swipe horizontally across the “Family” GUI to view the “Work” GUI, and so on. As such, each category GUI may be presented as its own screen, and may be presented in an order based on a level of importance, so as to better organize and prioritize the actions for completing a user’s multiple tasks. [0054] In this respect, various aspects of the techniques described in this disclosure may facilitate better user experience with applications executing on user devices. Specifically, smaller, more organized, and customizable widgets that provide users access to functionality of one or more larger applications may reduce the amount of time and effort required by a user to access such functionality when trying to complete tasks. The techniques described may also provide more assistance to users with disabilities when interacting with devices and applications. Furthermore, provided that the techniques described include generating new code based on user intent, users may be able to personalize the functionality of applications with which they interact without requiring a developer of the application to hard-code additional features or otherwise update the application. Additionally, users may find that organizing tasks based on associated categories is helpful for completing tasks in a less convoluted manner.
[0055] FIG. 2 is a block diagram illustrating another example computing system configured to apply a machine learning module to natural language text and audio, in accordance with one or more techniques of this disclosure. As shown in the example of FIG. 2, computing system 200 includes processors 224, one or more communication channels 230, one or more user interface components (UIC) 232, one or more communication units 228, and one or more storage devices 238. Storage devices 238 of computing system 200 may include user interface module 204, and user interface generator module 208. As shown in the example of FIG. 2, user interface generator module 208 further includes API module 206, machine learning module 210, speech-to-text module 226, and instructions storage 222.
[0056] Some or all of the components and/or functionality attributed to computing system 200 may be implemented or performed by a computing device in communication with computing system 200. Computing system 200, user interface module 204, user interface generator module 208, API module 206, machine learning module 210, and user interface (UI) components 202 may be similar if not substantially similar to computing system 100, user interface module 104, user interface generator module 108, API module 106, machine learning module 110, and user interface (UI) components 102 of FIG. 1, respectively.
[0057] The one or more communication units 228 of computing system 200, for example, may communicate with external devices by transmitting and/or receiving data at computing system 200, such as to and from remote computer systems or computing devices. Example communication units 228 include a network interface card (e.g., such as an Ethernet card), an optical transceiver, a radio frequency transceiver, or any other type of device that can send and/or receive information. Other examples of communication units 228 may be devices configured to transmit and receive Ultrawideband®, Bluetooth®, GPS, 3G, 4G, and Wi-Fi®, etc. that may be found in computing devices, such as mobile devices and the like.
[0058] As shown in the example of FIG. 2, communication channels 230 may interconnect each of the components as shown for inter-component communications (physically, communicatively, and/or operatively). In some examples, communication channels 230 may include a system bus, a network connection (e.g., to a wireless connection), one or more inter-process communication data structures, or any other components for communicating data between hardware and/or software locally or remotely.
[0059] One or more I/O devices 234 of computing system 200 may receive inputs and generate outputs. Examples of inputs are tactile, audio, kinetic, and optical input, to name only a few examples. Input devices of I/O devices 234, in one example, may include a touchscreen, a touchpad, a mouse, a keyboard, a voice responsive system, a video camera, buttons, a control pad, a microphone or any other type of device for detecting input from a human or machine. Output devices of I/O devices 234, may include, a sound card, a video graphics adapter card, a speaker, a display, or any other type of device for generating output to a human or machine.
[0060] User interface module 204, user interface generator module 208, API module 206, machine learning module 210, speech-to-text module 226, and instructions storage 222 (hereinafter “modules 204-226”) may perform operations described herein using software, hardware, firmware, or a mixture of hardware, software, and firmware residing in and executing on computing system 200 or at one or more other computing devices (e.g., a cloud-
based application - not shown). For example, some or all of modules 204-226 may be included in and executable on a local computing device, such as computing device 112 of FIG. 1. As such, the techniques described herein may all be implemented locally on a computing device.
[0061] Computing system 200 may execute one or more of modules 204-226, with one or more processors 224 or may execute any or part of one or more of modules 204-226 as or within a virtual machine executing on underlying hardware. One or more of modules 204-226 may be implemented in various ways, for example, as a downloadable or pre-installed application, remotely as a cloud application, or as part of the operating system of computing system 200. Other examples of computing system 200 that implement techniques of this disclosure may include additional components not shown in FIG. 2.
[0062] In the example of FIG. 2, one or more processors 224 may implement functionality and/or execute instructions within computing system 200. For example, one or more processors 224 may receive and execute instructions that provide the functionality of UIC 232, communication units 228, one or more storage devices 238 and an operating system to perform one or more operations as described herein. For example, one or more processors 224 may receive and execute instructions that provide the functionality of some or all of modules 204-226 to perform one or more operations and various functions described herein. The one or more processors 224 include a central processing unit (CPU). Examples of CPUs include, but are not limited to, a digital signal processor (DSP), a general-purpose microprocessor, a tensor processing unit (TPU); a neural processing unit (NPU); a neural processing engine; a core of a CPU, VPU, GPU, TPU, NPU or another processing device, an application specific integrated circuit (ASIC), a field programmable logic array (FPGA), or other equivalent integrated or discrete logic circuitry, or other equivalent integrated or discrete logic circuitry.
[0063] One or more storage devices 238 within computing system 200 may store information, such as information retrieved from a user computing device, or other data discussed herein, for processing during the operation of computing system 200. In some examples, one or more storage devices of storage devices 238 may be a volatile or temporary memory. Examples of volatile memories include random access memories (RAM), dynamic random-access memories (DRAM), static random-access memories (SRAM), and other forms of volatile memories known in the art. Storage devices 238, in some examples, may also include one or more computer-readable storage media. Storage devices 238 may be configured to store larger amounts of information for longer terms in non-volatile memory
than volatile memory. Examples of non-volatile memories include magnetic hard disks, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories. Storage devices 238 may store program instructions and/or data associated with the modules 204-226 of FIG. 2.
[0064] In general, with explicit consent from a user, computing system 200 may retrieve, using API module 206, information (e.g., API response data) associated with a plurality of functions included in one or more applications executing at a computing device. UI module 204 may receive an indication of a natural language user input associated with the plurality of functions. For example, the plurality of functions may include some or all of the functions that are predefined, e.g., by application developers, in the one or more applications executing at the computing device. In some examples, computing system 200 may retrieve data, e.g., user data, and/or context information from the one or more applications executing at the computing device, and/or the computing device itself. For example, the context information may include, but is not limited to, device location data, device information, network information, connectivity information, application usage data, environmental data, user preference data, battery status, sensor data, application permissions, calendar events, notification data, etc. The indication of the natural language user input may be associated with one or more functions from the plurality of functions. For example, the natural language user input may include an utterance such as, “Call electrician,” which may be associated with functionality for making a phone call, which may already be predefined in a phone application included in a smartphone.
[0065] In some examples, the indication of the natural language user input may be received by UI module 204 from the computing device in response to a gesture detected at a location of a presence-sensitive display of the computing device. In other words, a user may use a “touch and talk” feature on the computing device, in which the indication of the natural language user input is captured by the computing device and sent to UI module 204. UI module 204 may further interpret the indication or other inputs detected at the computing device. UI module 204 may relay information about the inputs detected at the computing device to one or more associated platforms, operating systems, applications, and/or services executing at the computing device to cause the computing device to perform a function. For example, if UI module 204 is unable to interpret the indication or other inputs, UI module 204 may relay information to the computing device in which the computing device may request the user to repeat or clarify the indication or other inputs. In some examples, UI
module 204 may determine whether the indication of a natural language user input is associated with one or more functions from the plurality of functions included in the one or more applications executing at the computing device. In other words, UI module 204 may determine whether the indication and/or other inputs are associated with the capabilities and/or functionality of the applications, such that the user’s desired functionality for completing tasks can be generated. For example, if UI module 204 receives user input requesting to “Send a message to Joe M.,” but determines that the user’s contacts application does not include contact information for a Joe M., UI module 204 may determine that functionality for performing the task of sending a message to Joe M. cannot be generated by computing system 200. UI module 204 may then relay information to the computing device indicating this error, in which the computing device may further relay this error to the user. [0066] UI module 204 may also receive information and instructions from one or more associated platforms, operating systems, applications, and/or services executing at the computing device (e.g., user interface generator module 208) for generating a file comprising the set of instructions. In general, the set of instructions may provide at least one function for performing a respective task from one or more tasks identified in the user input. The set of instructions may further include instructions for generating at least one graphical user interface associated with a respective category, in which at least one graphical user interface associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task. In some examples, UI module 204 may act as an intermediary between the one or more associated platforms, operating systems, applications, and/or services executing at the computing device and various output devices of the computing device (e.g., speakers, LED indicators, vibrators, etc.) to produce output (e.g., graphical, audible, tactile, etc.) with the computing device. [0067] In some examples, user interface generator module 208 may be implemented on a computing device in various ways. For example, user interface generator module 208 may be implemented as a downloadable or pre-installed application or “app.” In another example, user interface generator module 208 may be implemented as part of an operating system of a computing device.
[0068] Instructions storage 222 may be a storage repository for the information associated with the plurality of functions included in the one or more applications executing at the computing device that are retrieved by API module 206. In general, the information associated with the plurality of functions may include API response data, in which the API response data is associated with one or more capabilities and/or functionality of an
application that are predefined at compile time. Instructions storage 222 may also store, with explicit user consent, context data and/or other data (e.g., user data) retrieved from computing device 112 by API module 106.
[0069] Information may be stored in instructions storage 222 for use by other modules of user interface generator module 208, such as machine learning module 210. In some examples, instructions storage 222 may operate, at least in part, as a cache for instructions retrieved from a computing device (e.g., using one or more communication units 228) or other computing devices. In general, instructions storage 222 may be configured as a database, flat file, table, or other data structure stored within storage device 238. In some examples, instructions storage 222 is shared between various modules executing at computing system 200 (e.g., between one or more of modules 204-226 or other modules not shown in FIG. 2). In other examples, a different data repository is configured for a module executing at computing system 200 that requires a data repository. Each data repository may be configured and managed by different modules and may store data in a different manner. In some examples, computing system 200 may receive and store information from a computing device over a specified period of time.
[0070] In the example of FIG. 2, user interface generator module 208 may receive, from UI module 204, the indication of a natural language user input, which may be an audio or text input from a user operating a computing device. In examples where the user input is an audio input (e.g., comprising spoken language), speech-to-text module 226 may convert the input into a computer-readable format. Speech-to-text module 226 may implement an Automatic Speech Recognition (ASR) system to convert an audio input (e.g., a digital audio signal) into written text. In some examples, speech-to-text module 226 may preprocess the audio input to enhance quality and remove noise by normalizing the audio volume and filtering out any background noise. Speech-to-text module 226 may then transform the audio input into a more suitable format and extract features such as Mel-frequency cepstral coefficients (MFCCs), which capture information about the frequency content of the audio signal over short time intervals. In some examples, speech-to-text module 226 may perform acoustic modeling (e.g., with Hidden Markov Models (HMMs)), which may involve training a statistical model that maps the extracted audio features to phonemes. The acoustic model may learn to associate specific audio features with phonemes while taking into account the variations in pronunciation, accents, and speaking styles. In some examples, speech-to-text module 226 may further implement language modeling (e.g., deep learning techniques, such as recurrent neural networks (RNNs) and transformers) to capture and predict a sequence of words or
phrases while considering the context in which the words are spoken (e.g., speech-to-text module 226 may use context information received by UI module 204). Speech-to-text module 226 may further use the trained acoustic and language models to decode the audio input and generate a transcription or sequence of words that best match the observed audio features. Speech-to-text module 226 may further implement post-processing techniques (e.g., grammar checks, contextual analysis, spell correction, etc.) to refine the transcription and improve readability and accuracy. Speech-to-text module 226 may then output the transcribed text that represents the audio input to machine learning module 210 for further processing and analysis.
[0071] In general, machine learning module 210 may be configured to interpret both text and audio input received by UI module 204, such as to identify one or more tasks. In some examples, machine learning module 210 may be configured to infer any indication of a natural language user input. In other words, machine learning module 210 may infer capabilities from user intents. In some examples, machine learning module 210 may search capabilities. In some examples, machine learning module 210 may convert the audio or text input received by UI module 204, the transcribed text output from speech-to-text module 226, and/or any information stored in instructions storage 222 into structured text. For example, machine learning module 210 may convert any input or information to an extensible Markup Language (XML), or other structured text types, such as, but not limited to, HTML, JSON, CSV, INI Files, etc. In this way, the information and input received by user interface generator module 208 can be provided to ML module 210 in a standardized format.
Furthermore, in some examples, machine learning module 210 may determine the type of information to include in the structured text representation. More specifically, machine learning module 210 may analyze various application functionality, capabilities, and attributes included in the information stored in instructions storage 222, such as content descriptions, roles, states, actions, and/or other relevant properties of user interface elements, the contextual information associated with the user input, the audio or text input received by UI module 204, and/or the transcribed text output from speech-to-text module 226.
[0072] In some examples, as discussed above, the received indication of the natural language user input may be preprocessed. In some examples, the information stored in instructions storage 222 may be preprocessed. Preprocessing techniques may include extracting one or more additional features from raw data. For example, feature extraction techniques may be applied to the user input or retrieved instructions to generate one or more new, additional features.
[0073] In general, machine learning module 210 may employ a large language model (LLM) that can interpret the indication of a natural language user input and generate a set of instructions associated with a user’s desired application functionality and corresponding graphical user interface. In some examples, machine learning module 210 may implement other machine-learned models that may be used in place of or in conjunction with LLM model that is described with respect to FIGS. 3A, 3B, and 3C. Machine learning module 210 may perform various types of natural language processing (NLP) based on the indication of the natural language user input. The indication of the natural language user input, retrieved application information, context information, and/or other data (e.g., user data) received by computing system 200 may be referred to herein as “input data”. Thus, machine learning module 210 may apply one or more machine learning techniques to the input data. As described further below with respect to FIGS. 3 A, 3B, and 3C, in some examples, machine learning module 210 may apply a language model to the indication of the natural language user input to identify one or more tasks, in which each task from the one or more tasks is associated with a respective category from one or more categories. In some examples, machine learning module 210 may apply a machine learning model to the indication of the natural language user input to identify the one or more categories. As an example, UI module 204 may receive audio input including an utterance such as “Plan the trip with John next month, book Jane’s appointment, order Jersey for Jack.” Speech-to-text module 226 may convert the audio input into a text string, which may then be parsed by a machine learning module 210 to identify a first task, “Plan the trip with John next month,” a second task, “book Jane’s appointment,” and a third task, “order Jersey for Jack.” In this example, computing system 200 may have previously determined a “Family” category, e.g., a “Family” category GUI may have previously been generated on a user’s computing device. As such, machine learning module 210 may determine, e.g., based on information stored in instructions storage 222, the second task and the third task to be associated with the “Family” category. However, machine learning module 210 may determine that the first task is not associated with any previously determined category. As such, machine learning module 210 may further determine a new category for the first task, e.g., by applying a machine learning model that can determine a word or phrase associated with the first task, such as “Trip.” Therefore, in this example, machine learning module 210 may determine the first task to be associated with a “Trip” category, and the second and third tasks to be associated with the “Family” category. [0074] Machine learning module 210 may further apply, using the information associated with the plurality of functions, a machine learning model to the one or more tasks to generate
the set of instructions, in which the set of instructions provide at least one function for performing a respective task. For example, the set of instructions may provide functionality for performing one or more actions that complete a task. As an example, for performing a task such as “Plan the trip with John next month,” multiple actions (which may also be referred to herein as “subtasks”) may be involved, such as sending messages to John, sending funds to John, booking airline tickets, booking accommodations, finding attractions to visit, etc. In some examples, however, performing a single task and/or subtask may require functionality that has not been predefined by a single application. As such, using the information stored in instructions storage 222 (e.g., application API response data, user input, and user data), machine learning module 210 may generate code that provides “new” functionality for performing a task and/or subtask. As an example, for the subtask of booking airline tickets, machine learning module 210 may use information and existing functionality retrieved from a messaging application, a calendar application, and an airline application to generate new code that provides functionality for purchasing a specific airline ticket based on, for example, messages received from John indicating dates of travel, a user’s schedule for those dates of travel, and historical user data indicating the user’s preferred airline ticket class. In this way, rather than having to navigate through multiple applications and acquire relevant information manually, users may simply speak a command such as, “Plan the trip with John next month,” and computing system 200 may automatically determine relevant information and subtasks needed for performing the identified task of planning the trip. [0075] Furthermore, in general, the set of instructions may further include instructions for generating at least one GUI associated with a respective category, in which the at least one GUI may include at least one graphical component associated with the at least one function for performing the respective task. As an example, continuing the example above, the instructions may include instructions for generating a “Trip” GUI, in which the “Trip” GUI may include a widget for each identified subtask. For example, the “Trip” GUI may include a “Book Flight” widget that further includes a “Book This Flight” button for purchasing the specific airline ticket. That is, the “Book This Flight” button may be associated with the new functionality for purchasing the specific airline ticket, and may act as a “shortcut” for the user in performing the task of purchasing the specific airline ticket.
[0076] FIG. 3 A is a conceptual diagram illustrating an example training process for a machine learning module, in accordance with one or more techniques of this disclosure. In some examples, computing device 112 of FIG. 1 may store and implement machine learning module 310 locally (i.e., on-device). Thus, in some examples, machine learning module 310
can be stored at and/or implemented locally by an embedded device or a user computing device such as a mobile device. Output data obtained through local implementation of machine learning module 310 at the embedded device or the user computing device can be used to improve performance of the embedded device or the user computing device (e.g., an application implemented by the embedded device or the user computing device). Machine learning module 310 described herein can be trained at a training computing system, and then provided for storage and/or implementation at one or more computing devices, such as computing device 112 of FIG. 1. In some examples, training process 340 executes locally at computing system 100 of FIG. 1. However in some examples, training process 340 can be included in or separate from any computing system that implements machine learning module 310.
[0077] In general, machine learning module 310 may be or include one or more inference models, i.e., one or more trained machine learning models that can be used to make predictions based on new, unseen data. Machine learning module 310 may “infer” conclusions or outputs, which may be predictions, classifications, recommendations, or other types of decision-making. Machine learning module 310 may be trained according to one or more of various different training types or techniques. For example, in some examples, machine learning module 310 may be trained by training process 340 of FIG. 3 A.
[0078] As further shown in the example of FIG. 3 A, in some examples, machine learning module 310 may be trained on training data 331 that may include input data 333 that has labels 337. The training process shown in FIG. 3A is one example training process; other training processes may be used as well. In general, during training process 340, machine learning module 310 may learn patterns from training data 331, and training process 340 may optimize parameters for machine learning module 310 to minimize prediction errors.
[0079] Training data 331 can include, upon user permission for use of such data for training, anonymized usage logs of sharing flows, e.g., content items that were shared together, bundled content pieces already identified as belonging together, e.g., from entities in a knowledge graph, etc. In some examples, training data 331 can include examples of input data 333 that have been assigned labels 337 that correspond to output data 335.
[0080] In some examples, machine learning module 310 can be trained by optimizing an objective function, such as objective function 339. For example, in some examples, objective function 339 may be or include a loss function that compares (e.g., determines a difference between) output data generated by the model from the training data and labels (e.g., groundtruth labels) associated with the training data. For example, the loss function can evaluate a
sum or mean of squared differences between output data 335 and the labels. In some examples, objective function 339 may be or include a cost function that describes a cost of a certain outcome or output data. Other examples of objective function 339 can include marginbased techniques such as, for example, triplet loss or maximum-margin training.
[0081] One or more of various optimization techniques can be performed to optimize objective function 339. For example, the optimization technique(s) can minimize or maximize objective function 339. Example optimization techniques include Hessian-based techniques and gradient-based techniques, such as, for example, coordinate descent; gradient descent (e.g., stochastic gradient descent); subgradient methods; etc. Other optimization techniques include black box optimization techniques and heuristics.
[0082] In some examples, backward propagation of errors can be used in conjunction with an optimization technique (e.g., gradient based techniques) to train machine learning module 310 (e.g., when a machine-learned model is a multi-layer model such as an artificial neural network). For example, an iterative cycle of propagation and model parameter (e.g., weights) update can be performed to train machine learning module 310. Example backpropagation techniques include truncated backpropagation through time, Levenberg- Marquardt backpropagation, etc.
[0083] In some examples, machine learning module 310 described herein can be trained using unsupervised learning techniques. Unsupervised learning can include inferring a function to describe hidden structure from unlabeled data. For example, a classification or categorization may not be included in the data. Unsupervised learning techniques can be used to produce machine-learned models capable of performing clustering, anomaly detection, learning latent variable models, or other tasks.
[0084] Machine learning module 310 can be trained using semi-supervised techniques which combine aspects of supervised learning and unsupervised learning. Machine learning module 310 can be trained or otherwise generated through evolutionary techniques or genetic algorithms. In some examples, machine learning module 310 described herein can be trained using reinforcement learning. In reinforcement learning, an agent (e.g., model) can take actions in an environment and learn to maximize rewards and/or minimize penalties that result from such actions. Reinforcement learning can differ from the supervised learning problem in that correct input/output pairs are not presented, nor sub-optimal actions explicitly corrected.
[0085] In some examples, one or more generalization techniques can be performed during training to improve the generalization of machine learning module 310. Generalization
techniques can help reduce overfitting of machine learning module 310 to the training data. Example generalization techniques include dropout techniques; weight decay techniques; batch normalization; early stopping; subset selection; stepwise selection; etc.
[0086] In some examples, machine learning module 310 described herein can include or otherwise be impacted by a number of hyperparameters, such as, for example, learning rate, number of layers, number of nodes in each layer, number of leaves in a tree, number of clusters; etc. Hyperparameters can affect model performance. Hyperparameters can be hand selected or can be automatically selected through application of techniques such as, for example, grid search; black box optimization techniques (e.g., Bayesian optimization, random search, etc.); gradient-based optimization; etc. Example techniques and/or tools for performing automatic hyperparameter optimization include Hyperopt; Auto-WEKA; Spearmint; Metric Optimization Engine (MOE); etc.
[0087] In some examples, various techniques can be used to optimize and/or adapt the learning rate when the model is trained. Example techniques and/or tools for performing learning rate optimization or adaptation include Adagrad; Adaptive Moment Estimation (ADAM); Adadelta; RMSprop; etc.
[0088] In some examples, transfer learning techniques can be used to provide an initial model from which to begin training of machine learning module 310 described herein. In some examples, transfer learning involves reusing a model and its model parameters obtained while solving one problem and applying it to a different but related problem. Models trained on very large data sets may be retrained or fine-tuned on additional data. Often, all model designs and their parameters on a source model are copied except output layer(s). The output layers(s) are often called the head, and other layers are often called the base. The source parameters may be considered to contain the knowledge learned from the source dataset and this knowledge may also be applicable to a target dataset. Fine-tuning may include updating the head parameters with the body parameters being fixed or updated in a later step.
[0089] In some examples, machine learning module 310 may be trained in an offline fashion or an online fashion. In offline training (also known as batch learning), machine learning module 310 is trained on the entirety of a static set of training data. In online learning, machine learning module 310 is continuously trained (or re-trained) as new training data becomes available (e.g., while the model is used to perform inference).
[0090] In some examples, training process 340 may involve centralized training of machine learning module 310 (e.g., based on a centrally stored dataset). In other implementations,
decentralized training techniques such as distributed training, federated learning, or the like can be used to train, update, or personalize machine learning module 310.
[0091] Machine learning module 310 described herein can be trained according to one or more of various different training types or techniques. For example, in some examples, machine learning module 310 can be trained by training process 340 using supervised learning, in which machine learning module 310 is trained on a training dataset that includes instances or examples that have labels. The labels can be manually applied by experts, generated through crowd-sourcing, or provided by other techniques (e.g., by physics-based or complex mathematical models). In some examples, if the user has provided consent, the training examples can be provided by the user computing device. In some examples, this process can be referred to as personalizing the model.
[0092] In some examples, machine learning module 310 includes a language model that may be trained (e.g., pre-trained, fine-tuned, etc.) by training process 340. For example, training process 340 may pre-train a language model on a large and diverse corpus of text. As such, in some examples, training data 331 may include a dataset that covers a wide range of topics and domains to ensure machine learning module 310 learns diverse linguistic patterns and contextual relationships. Training process 340 may train a language model to optimize objective function 339. Objective function 339 may be or include a loss function, such as cross-entropy loss, that compares (e.g., determines a difference between) output data generated by the model from training data 331 and labels 337 (e.g., ground-truth labels) associated with training data 331. For example, objective function 339 for a language model may be to correctly predict the next word in a sequence of words or correctly fill in missing words as much as possible.
[0093] In some examples, training process 340 may use techniques such low-rank adaptation (LoRA) to train or fine-tune language models (LLMs) implemented by machine learning module 310. In general, LoRA may reduce the number of trainable parameters by freezing pre-trained weights of an LLM and injecting small, trainable low-rank matrices that adapt the model for specific tasks. LoRa may be useful when a model needs to be adapted to multiple tasks with limited task-specific data. That is, training process 340 may use LoRA for taskspecific fine-tuning. In some examples, training process 340 may use techniques such as retrieval-augmented generation (RAG), which is a hybrid framework that combines information retrieval with text generation. RAG may be used to fine-tune a generative model implemented by machine learning module 310 by retrieving relevant information from an external database or dataset (e.g., a large and diverse corpus of text) and using that
information to generate output that is more accurate and informative. RAG may be useful for generating more factually accurate and contextually relevant summaries and responses to questions.
[0094] In some examples, training process 340 may continuously or periodically train a language model included in machine learning module 310. In some examples, training process 340 may fine-tune a language model by using feedback in the training process. For example, UI component 202 of FIG. 2 may receive a user input via a computing device that selects feedback (e.g., thumbs up, thumbs down, etc.) relating to the generated application functionality and associated GUIs that are presented to the user on the computing device. In some examples, the feedback may indicate whether the generated application functionality and associated GUIs are accurate or inaccurate, correct or incorrect, high quality or low quality, etc. UI module 204 may receive this feedback and may send it to user interface generator module 208. User interface generator module 208 may transmit the feedback to machine learning module 310 (specifically to training process 340), in which training process 340 uses the feedback for training. For example, training process 340 may convert the feedback into labeled data for supervised training. Additionally or alternatively, training process 340 may fine-tune a language model by monitoring the relationship between the performance of the language model and user feedback, and iterate the fine-tuning process as necessary (e.g., to receive more positive user feedback and less negative user feedback). In this way, the techniques of this disclosure may establish a feedback loop that continuously improves the quality of output data 335 (e.g., an instructions file) of a language model.
[0095] FIG. 3B is a conceptual diagram illustrating an example trained machine learning module, in accordance with one or more techniques of this disclosure. In some examples, computing device 112 of FIG. 1 may store and implement machine learning module 310 locally (i.e., on-device). Thus, in some examples, machine learning module 310 can be stored at and/or implemented locally by an embedded device or a user computing device such as a mobile device. Output data obtained through local implementation of machine learning module 310 at the embedded device or the user computing device can be used to improve performance of the embedded device or the user computing device (e.g., an application implemented by the embedded device or the user computing device). Machine learning module 310 of FIG. 3B may be trained at a computing system, such as computing system 100 of FIG. 1, and then provided for storage and/or implementation at one or more computing devices, such as computing device 112 of FIG. 1. In some examples, machine learning module
310 executes locally at computing system 100 of FIG. 1. In some examples, computing system 100 may perform machine learning as a service.
[0096] As illustrated in FIG. 3B, in some examples, machine learning module 310 is trained (e.g., via training process 340 of FIG. 3A) to receive input data 333, which may be of one or more types and, in response, provide output data 335, which may be of one or more types. Thus, FIG. 3B illustrates machine learning module 310 performing inference, in which machine learning module 310 may use learned patterns to make predictions or decisions on new data, e.g., input data 333. Machine learning module 310 may include one or more machine-learned models trained by training process 340 of FIG. 3 A.
[0097] Input data 333 may include one or more features that are associated with an instance or an example. In some examples, the one or more features associated with the instance or example can be organized into a feature vector. In some examples, output data 335 can include one or more predictions. Predictions can also be referred to as inferences. Thus, given features associated with a particular instance, machine learning module 310 can output a prediction for such instance based on the features.
[0098] Machine learning module 310 can be or include one or more of various different types of machine-learned models. In particular, in some examples, machine learning module 310 may perform NLP tasks. Machine learning module 310 may summarize, translate, or organize input data 333. Machine learning module 310 may use recurrent neural networks (RNNs) and/or transformer models (self-attention models). Example models may include, but are not limited to, GPT-3, BERT, Gemini (e.g., Gemini Ultra, Gemini Pro, Gemini Flash, Gemini Nano), Android AlCore, and T5. In some examples, machine learning module 310 may perform classification, summarization, name generation, regression, clustering, anomaly detection, recommendation generation, and/or other tasks.
[0099] In some examples, machine learning module 310 can perform various types of classification based on input data 333. For example, machine learning module 310 can perform binary classification or multiclass classification. In binary classification, output data 335 can include a classification of input data 333 into one of two different classes. In multiclass classification, output data 335 can include a classification of input data 333 into one (or more) of more than two classes. The classifications can be single label or multi-label. Machine learning module 310 may perform discrete categorical classification in which input data 333 is simply classified into one or more classes or categories.
[0100] In some examples, machine learning module 310 can perform classification in which machine learning module 310 provides, for each of one or more classes, a numerical value
descriptive of a degree to which it is believed that input data 333 should be classified into the corresponding class. In some instances, the numerical values provided by machine learning module 310 can be referred to as “confidence scores” that are indicative of a respective confidence associated with classification of the input into the respective class. In some examples, the confidence scores can be compared to one or more thresholds to render a discrete categorical prediction. In some examples, only a certain number of classes (e.g., one) with the relatively largest confidence scores can be selected to render a discrete categorical prediction.
[0101] Machine learning module 310 may output a probabilistic classification. For example, machine learning module 310 may predict, given a sample input, a probability distribution over a set of classes. Thus, rather than outputting only the most likely class to which the sample input should belong, machine learning module 310 can output, for each class, a probability that the sample input belongs to such class. In some examples, the probability distribution over all possible classes can sum to one. In some examples, a Softmax function, or other type of function or layer can be used to squash a set of real values respectively associated with the possible classes to a set of real values in the range (0, 1) that sum to one. [0102] In some examples, the probabilities provided by the probability distribution can be compared to one or more thresholds to render a discrete categorical prediction. In some examples, only a certain number of classes (e.g., one) with the relatively largest predicted probability can be selected to render a discrete categorical prediction.
[0103] In cases in which machine learning module 310 performs classification, machine learning module 310 may be trained using supervised learning techniques. For example, machine learning module 310 may be trained on a training dataset that includes training examples labeled as belonging (or not belonging) to one or more classes.
[0104] In some examples, machine learning module 310 can perform regression to provide output data in the form of a continuous numeric value. The continuous numeric value can correspond to any number of different metrics or numeric representations, including, for example, currency values, scores, or other numeric representations. As examples, machine learning module 310 can perform linear regression, polynomial regression, or nonlinear regression. As examples, machine learning module 310 can perform simple regression or multiple regression. As described above, in some examples, a Softmax function or other function or layer can be used to squash a set of real values respectively associated with two or more possible classes to a set of real values in the range (0, 1) that sum to one.
[0105] Machine learning module 310 may perform various types of clustering. For example, machine learning module 310 can identify one or more previously-defined clusters to which input data 333 most likely corresponds. Machine learning module 310 may identify one or more clusters within input data 333. That is, in instances in which input data 333 includes multiple objects, documents, or other entities, machine learning module 310 can sort the multiple entities included in input data 333 into a number of clusters. In some examples in which machine learning module 310 performs clustering, machine learning module 310 can be trained using unsupervised learning techniques.
[0106] Machine learning module 310 may perform anomaly detection or outlier detection. For example, machine learning module 310 can identify input data that does not conform to an expected pattern or other characteristic (e.g., as previously observed from previous input data). As examples, the anomaly detection can be used for fraud detection or system failure detection.
[0107] In some examples, machine learning module 310 can provide output data in the form of one or more recommendations. For example, machine learning module 310 can be included in a recommendation system or engine. As an example, given input data that describes previous outcomes for certain entities (e.g., a score, ranking, or rating indicative of an amount of success or enjoyment), machine learning module 310 can output a suggestion or recommendation of one or more additional entities that, based on the previous outcomes, are expected to have a desired outcome (e.g., elicit a score, ranking, or rating indicative of success or enjoyment). As one example, given input data descriptive of a context of a computing device, such as computing device 112 of FIG. 1, a recommendation system can output a suggestion or recommendation of an application that the user might enjoy or wish to download to computing device 112.
[0108] Machine learning module 310 may, in some cases, act as an agent within an environment. For example, machine learning module 310 can be trained using reinforcement learning, which will be discussed in further detail below.
[0109] In some examples, machine learning module 310 can be a parametric model while, in other implementations, machine learning module 310 can be a non-parametric model. In some examples, machine learning module 310 can be a linear model while, in other implementations, machine learning module 310 can be a non-linear model.
[0110] As described above, machine learning module 310 can be or include one or more of various different types of machine-learned models. Examples of such different types of machine-learned models are provided below for illustration. One or more of the example
models described below can be used (e.g., combined) to provide output data 335 in response to input data 333. Additional models beyond the example models provided below can be used as well.
[OHl] In some examples, machine learning module 310 can be or include one or more classifier models such as, for example, linear classification models; quadratic classification models; etc. Machine learning module 310 may be or include one or more regression models such as, for example, simple linear regression models; multiple linear regression models; logistic regression models; stepwise regression models; multivariate adaptive regression splines; locally estimated scatterplot smoothing models; etc.
[0112] In some examples, machine learning module 310 can be or include one or more decision tree-based models such as, for example, classification and/or regression trees; iterative dichotomiser 3 decision trees; C4.5 decision trees; chi-squared automatic interaction detection decision trees; decision stumps; conditional decision trees; etc.
[0113] Machine learning module 310 may be or include one or more kernel machines. In some examples, machine learning module 310 can be or include one or more support vector machines. Machine learning module 310 may be or include one or more instance-based learning models such as, for example, learning vector quantization models; self- organizing map models; locally weighted learning models; etc. In some examples, machine learning module 310 can be or include one or more nearest neighbor models such as, for example, k- nearest neighbor classifications models; k- nearest neighbors regression models; etc. Machine learning module 310 can be or include one or more Bayesian models such as, for example, naive Bayes models; Gaussian naive Bayes models; multinomial naive Bayes models; averaged one-dependence estimators; Bayesian networks; Bayesian belief networks; hidden Markov models; etc.
[0114] In some examples, machine learning module 310 can be or include one or more artificial neural networks (also referred to simply as neural networks). A neural network can include a group of connected nodes, which also can be referred to as neurons or perceptrons. A neural network can be organized into one or more layers. Neural networks that include multiple layers can be referred to as “deep” networks. A deep network can include an input layer, an output layer, and one or more hidden layers positioned between the input layer and the output layer. The nodes of the neural network can be connected or non-fully connected. [0115] Machine learning module 310 can be or include one or more feed forward neural networks. In feed forward networks, the connections between nodes do not form a cycle. For
example, each connection can connect a node from an earlier layer to a node from a later layer.
[0116] In some instances, machine learning module 310 can be or include one or more recurrent neural networks. In some instances, at least some of the nodes of a recurrent neural network can form a cycle. Recurrent neural networks can be especially useful for processing input data that is sequential in nature. In particular, in some instances, a recurrent neural network can pass or retain information from a previous portion of input data 333 sequence to a subsequent portion of input data 333 sequence through the use of recurrent or directed cyclical node connections.
[0117] In some examples, sequential input data can include time-series data (e.g., sensor data versus time or imagery captured at different times). For example, a recurrent neural network can analyze sensor data versus time to detect or predict a swipe direction, to perform handwriting recognition, etc. Sequential input data may include words in a sentence (e.g., for natural language processing, speech detection or processing, etc.); notes in a musical composition; sequential actions taken by a user (e.g., to detect or predict sequential application usage); sequential object states; etc.
[0118] Example recurrent neural networks include long short-term (LSTM) recurrent neural networks; gated recurrent units; bi-direction recurrent neural networks; continuous time recurrent neural networks; neural history compressors; echo state networks; Elman networks; Jordan networks; recursive neural networks; Hopfield networks; fully recurrent networks; sequence-to- sequence configurations; etc.
[0119] In some examples, machine learning module 310 can be or include one or more convolutional neural networks. In some instances, a convolutional neural network can include one or more convolutional layers that perform convolutions over input data using learned filters.
[0120] Filters can also be referred to as kernels. Convolutional neural networks can be especially useful for vision problems such as when input data 333 includes imagery such as still images or video. However, convolutional neural networks can also be applied for natural language processing.
[0121] In some examples, machine learning module 310 can be or include one or more generative networks such as, for example, generative adversarial networks. Generative networks can be used to generate new data such as new images or other content.
[0122] Machine learning module 310 may be or include an autoencoder. In some instances, the aim of an autoencoder is to learn a representation (e.g., a lower- dimensional encoding)
for a set of data, typically for the purpose of dimensionality reduction. For example, in some instances, an autoencoder can seek to encode input data 333 and then provide output data that reconstructs input data 333 from the encoding. Recently, the autoencoder concept has become more widely used for learning generative models of data. In some instances, the autoencoder can include additional losses beyond reconstructing input data 333.
[0123] Machine learning module 310 may be or include one or more other forms of artificial neural networks such as, for example, deep Boltzmann machines; deep belief networks; stacked autoencoders; etc. Any of the neural networks described herein can be combined (e.g., stacked) to form more complex networks.
[0124] One or more neural networks can be used to provide an embedding based on input data 333. For example, the embedding can be a representation of knowledge abstracted from input data 333 into one or more learned dimensions. In some instances, embeddings can be a useful source for identifying related entities. In some instances, embeddings can be extracted from the output of the network, while in other instances embeddings can be extracted from any hidden node or layer of the network (e.g., a close to final but not final layer of the network). Embeddings can be useful for performing auto suggest next video, product suggestion, entity or object recognition, etc. In some instances, embeddings can be useful inputs for downstream models. For example, embeddings can be useful to generalize input data (e.g., search queries) for a downstream model or processing system.
[0125] Machine learning module 310 may include one or more clustering models such as, for example, k-means clustering models; k-medians clustering models; expectation maximization models; hierarchical clustering models; etc.
[0126] In some examples, machine learning module 310 can perform one or more dimensionality reduction techniques such as, for example, principal component analysis; kernel principal component analysis; graph-based kernel principal component analysis; principal component regression; partial least squares regression; Sammon mapping; multidimensional scaling; projection pursuit; linear discriminant analysis; mixture discriminant analysis; quadratic discriminant analysis; generalized discriminant analysis; flexible discriminant analysis; autoencoding; etc.
[0127] In some examples, machine learning module 310 can perform or be subjected to one or more reinforcement learning techniques such as Markov decision processes; dynamic programming; Q functions or Q-learning; value function approaches; deep Q-networks; differentiable neural computers; asynchronous advantage actor-critics; deterministic policy gradient; etc.
[0128] In some examples, machine learning module 310 can be an autoregressive model. In some instances, an autoregressive model can specify that output data 335 depends linearly on its own previous values and on a stochastic term. In some instances, an autoregressive model can take the form of a stochastic difference equation. One example autoregressive model is WaveNet, which is a generative model for raw audio.
[0129] In some examples, machine learning module 310 can include or form part of a multiple model ensemble. As one example, bootstrap aggregating can be performed, which can also be referred to as “bagging.” In bootstrap aggregating, a training dataset is split into a number of subsets (e.g., through random sampling with replacement) and a plurality of models are respectively trained on the number of subsets. At inference time, respective outputs of the plurality of models can be combined (e.g., through averaging, voting, or other techniques) and used as the output of the ensemble.
[0130] One example ensemble is a random forest, which can also be referred to as a random decision forest. Random forests are an ensemble learning method for classification, regression, and other tasks. Random forests are generated by producing a plurality of decision trees at training time. In some instances, at inference time, the class that is the mode of the classes (classification) or the mean prediction (regression) of the individual trees can be used as the output of the forest. Random decision forests can correct for decision trees' tendency to overfit their training set.
[0131] Another example ensemble technique is stacking, which can, in some instances, be referred to as stacked generalization. Stacking includes training a combiner model to blend or otherwise combine the predictions of several other machine-learned models. Thus, a plurality of machine-learned models (e.g., of same or different type) can be trained based on training data. In addition, a combiner model can be trained to take the predictions from the other machine-learned models as inputs and, in response, produce a final inference or prediction. In some instances, a single-layer logistic regression model can be used as the combiner model. [0132] Another example of an ensemble technique is boosting. Boosting can include incrementally building an ensemble by iteratively training weak models and then adding to a final strong model. For example, in some instances, each new model can be trained to emphasize the training examples that previous models misinterpreted (e.g., misclassified).
For example, a weight associated with each of such misinterpreted examples can be increased. One common implementation of boosting is AdaBoost, which can also be referred to as Adaptive Boosting. Other example boosting techniques include LPBoost; TotalBoost; BrownBoost; xgboost; MadaBoost, LogitBoost, gradient boosting; etc. Furthermore, any of
the models described above (e.g., regression models and artificial neural networks) can be combined to form an ensemble. As an example, an ensemble can include a top level machine- learned model or a heuristic function to combine and/or weight the outputs of the models that form the ensemble.
[0133] In some examples, multiple machine-learned models (e.g., that form an ensemble can be linked and trained jointly (e.g., through backpropagation of errors sequentially through the model ensemble). However, in some examples, only a subset (e.g., one) of the jointly trained models is used for inference.
[0134] In some examples, machine learning module 310 can be used to preprocess input data 333 for subsequent input into another model. For example, machine learning module 310 can perform dimensionality reduction techniques and embeddings (e.g., matrix factorization, principal components analysis, singular value decomposition, word2vec/GLOVE, and/or related approaches); clustering; and even classification and regression for downstream consumption.
[0135] As discussed above, machine learning module 310 can be trained or otherwise configured to receive input data 333 and, in response, provide output data 335. Input data 333 can include different types, forms, or variations of input data. As examples, in various implementations, input data 333 can include features that describe the content (or portion of content) initially selected by the user, e.g., content of user-selected document or image, links pointing to the user selection, links within the user selection relating to other files available on device or cloud, metadata of user selection, etc. Additionally, with user permission, input data 333 includes the context of user usage, either obtained from the app itself or from other sources. Examples of usage context include breadth of share (sharing publicly, or with a large group, or privately, or a specific person), context of share, etc. When permitted by the user, additional input data can include the state of the device, e.g., the location of the device, the apps running on the device, etc.
[0136] In some examples, machine learning module 310 can receive and use input data 333 in its raw form. In some examples, the raw input data can be preprocessed. Thus, in addition or alternatively to the raw input data, machine learning module 310 can receive and use the preprocessed input data.
[0137] In some examples, preprocessing input data 333 can include extracting one or more additional features from the raw input data. For example, feature extraction techniques can be applied to input data 333 to generate one or more new, additional features. Example feature extraction techniques include edge detection; corner detection; blob detection; ridge
detection; scale-invariant feature transform; motion detection; optical flow; Hough transform; etc.
[0138] In some examples, the extracted features can include or be derived from transformations of input data 333 into other domains and/or dimensions. As an example, the extracted features can include or be derived from transformations of input data 333 into the frequency domain. For example, wavelet transformations and/or fast Fourier transforms can be performed on input data 333 to generate additional features.
[0139] In some examples, the extracted features can include statistics calculated from input data 333 or certain portions or dimensions of input data 333. Example statistics include the mode, mean, maximum, minimum, or other metrics of input data 333 or portions thereof.
[0140] In some examples, as described above, input data 333 can be sequential in nature. In some instances, the sequential input data can be generated by sampling or otherwise segmenting a stream of input data. As one example, frames can be extracted from a video. In some examples, sequential data can be made non-sequential through summarization.
[0141] As another example preprocessing technique, portions of input data 333 can be imputed. For example, additional synthetic input data can be generated through interpolation and/or extrapolation.
[0142] As another example preprocessing technique, some or all of input data 333 can be scaled, standardized, normalized, generalized, and/or regularized. Example regularization techniques include ridge regression; least absolute shrinkage and selection operator (LASSO); elastic net; least-angle regression; cross-validation; LI regularization; L2 regularization; etc. As one example, some or all of input data 333 can be normalized by subtracting the mean across a given dimension’s feature values from each individual feature value and then dividing by the standard deviation or other metric.
[0143] As another example preprocessing technique, some or all or input data 333 can be quantized or discretized. In some cases, qualitative features or variables included in input data 333 can be converted to quantitative features or variables. For example, one hot encoding can be performed.
[0144] In some examples, dimensionality reduction techniques can be applied to input data 333 prior to input into machine learning module 310. Several examples of dimensionality reduction techniques are provided above, including, for example, principal component analysis; kernel principal component analysis; graph-based kernel principal component analysis; principal component regression; partial least squares regression; Sammon mapping; multidimensional scaling; projection pursuit; linear discriminant analysis; mixture
discriminant analysis; quadratic discriminant analysis; generalized discriminant analysis; flexible discriminant analysis; autoencoding; etc.
[0145] In some examples, during training, input data 333 can be intentionally deformed in any number of ways to increase model robustness, generalization, or other qualities. Example techniques to deform input data 333 include adding noise; changing color, shade, or hue; magnification; segmentation; amplification; etc.
[0146] In response to receipt of input data 333, machine learning module 310 can provide output data 335. Output data 335 can include different types, forms, or variations of output data. As examples, in various implementations, output data 335 can include content, either stored locally on the user device or in the cloud, that is relevantly shareable along with the initial content selection.
[0147] As discussed above, in some examples, output data 335 can include various types of classification data (e.g., binary classification, multiclass classification, single label, multilabel, discrete classification, regressive classification, probabilistic classification, etc.) or can include various types of regressive data (e.g., linear regression, polynomial regression, nonlinear regression, simple regression, multiple regression, etc.). In other instances, output data 335 can include clustering data, anomaly detection data, recommendation data, or any of the other forms of output data discussed above.
[0148] In some examples, output data 335 can influence downstream processes or decision making. As one example, in some examples, output data 335 can be interpreted and/or acted upon by a rules-based regulator.
[0149] Any of the different types or forms of input data described herein can be combined with any of the different types or forms of machine-learned models described herein to provide any of the different types or forms of output data described herein.
[0150] The systems and methods of the present disclosure can be implemented by or otherwise executed on one or more computing devices. Example computing devices include user computing devices (e.g., laptops, desktops, and mobile computing devices such as tablets, smartphones, wearable computing devices, etc.); embedded computing devices (e.g., devices embedded within a vehicle, camera, image sensor, industrial machine, satellite, gaming console or controller, or home appliance such as a refrigerator, thermostat, energy meter, home energy manager, smart home assistant, etc.); server computing devices (e.g., database servers, parameter servers, file servers, mail servers, print servers, web servers, game servers, application servers, etc.); dedicated, specialized model processing or training devices; virtual computing devices; other computing devices or computing infrastructure; or
combinations thereof. A computing system that implements machine learning module 310 or other aspects of the present disclosure may include a number of hardware components that enable the performance of the techniques described herein.
[0151] In some instances, output data 335 obtained through machine learning module 310 at a computing system or device can be used to improve other device tasks or can be used by other non-user devices to improve services performed by or for such other non-user devices. For example, output data 335 can improve other downstream processes performed by a server device for a computing device of a user or embedded computing device. In other instances, output data 335 obtained through implementation of machine learning module 310 at a computing system or device can be sent to and used by a user computing device, an embedded computing device, or some other client device. In some examples, computing system 200 of FIG. 2 may perform machine learning as a service.
[0152] In yet other implementations, different respective portions of machine learning module 310 can be stored at and/or implemented by some combination of a user computing device; an embedded computing device; a server computing device; etc. In other words, portions of machine learning module 310 may be distributed in whole or in part amongst a client device (e.g., computing device 112 of FIG. 1) and a computing system (e.g., computing system 100 of FIG. 1).
[0153] A computing device such as computing device 112 of FIG. 1 may perform graph processing techniques or other machine learning techniques using one or more machine learning platforms, frameworks, and/or libraries, such as, for example, TensorFlow, Caffe/Caffe2, Theano, Torch/Py Torch, MXnet, CNTK, etc.
[0154] In some examples, multiple instances of machine learning module 310 can be parallelized to provide increased processing throughput. For example, the multiple instances of machine learning module 310 can be parallelized on a single processing device or computing device or parallelized across multiple processing devices or computing devices. [0155] A computing device that implements machine learning module 310 or other aspects of the present disclosure can include a number of hardware components that enable performance of the techniques described herein. For example, a computing device can include one or more memory devices that store some or all of machine learning module 310. For example, machine learning module 310 can be a structured numerical representation that is stored in memory. The one or more memory devices can also include instructions for implementing machine learning module 310 or performing other operations. Example memory devices
include RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
[0156] A computing device can also include one or more processing devices that implement some or all of machine learning module 310 and/or perform other related operations.
Example processing devices include one or more of: a central processing unit (CPU); a visual processing unit (VPU); a graphics processing unit (GPU); a tensor processing unit (TPU); a neural processing unit (NPU); a neural processing engine; a core of a CPU, VPU, GPU, TPU, NPU or other processing device; an application specific integrated circuit (ASIC); a field programmable gate array (FPGA); a co-processor; a controller; or combinations of the processing devices described above. Processing devices can be embedded within other hardware components such as, for example, an image sensor, accelerometer, etc.
[0157] Hardware components (e.g., memory devices and/or processing devices) can be spread across multiple physically distributed computing devices and/or virtually distributed computing systems.
[0158] In some examples, machine learning module 310 described herein can be included in different portions of computer-readable code on a computing device. In one example, machine learning module 310 can be included in a particular application or program and used (e.g., exclusively) by such a particular application or program. Thus, in one example, a computing device can include a number of applications and one or more of such applications can contain its own respective machine learning library and machine-learned model(s).
[0159] In another example, machine learning module 310 described herein can be included in an operating system of a computing device (e.g., in a central intelligence layer of an operating system) and can be called or otherwise used by one or more applications that interact with the operating system. In some examples, each application can communicate with the central intelligence layer (and model(s) stored therein) using an application programming interface (API) (e.g., a common, public API across all applications).
[0160] In some examples, the central intelligence layer can communicate with a central device data layer. The central device data layer can be a centralized repository of data for the computing device. The central device data layer can communicate with a number of other components of the computing device, such as, for example, one or more sensors, a context manager, a device state component, and/or additional components. In some examples, the central device data layer can communicate with each device component using an API (e.g., a private API).
[0161] The technology discussed herein makes reference to servers, databases, software applications, and other computer-based systems, as well as actions taken and information sent to and from such systems. The inherent flexibility of computer-based systems allows for a great variety of possible configurations, combinations, and divisions of tasks and functionality between and among components. For instance, processes discussed herein can be implemented using a single device or component or multiple devices or components working in combination.
[0162] Databases and applications can be implemented on a single system or distributed across multiple systems. Distributed components can operate sequentially or in parallel. [0163] In addition, the machine learning techniques described herein are readily interchangeable and combinable. Although certain example techniques have been described, many others exist and can be used in conjunction with aspects of the present disclosure. [0164] Further to the descriptions above, a user may be provided with controls that enable the user to make an election as to both if and when systems, programs or features described herein may enable collection of user information (e.g., information about a user’s social network, social actions or activities, profession, a user’s preferences, or a user’s current location), and if the user is sent content or communications from a server. In addition, certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information is removed. For example, a user’s identity may be treated so that no personally identifiable information can be determined for the user, or a user’s geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined. Thus, the user may have control over what information is collected about the user, how that information is used, and what information is provided to the user.
[0165] FIG. 3C is a conceptual diagram illustrating a machine learning module configured to apply a large language model that accepts natural language input and provides code for corresponding graphical user interfaces and application functionality as output, in accordance with one or more techniques of this disclosure. Machine learning module 310 of FIG. 3C may be an example of machine learning module 310 of FIGS. 3 A and 3B. In general, ML module 310 can be or include one or more transformer-based neural networks, such as a large language model module 342. Language model module 342 may implement, for example, the Pathways Language Model developed by Google. Transformer-based neural networks may refer to a type of deep learning architecture specifically designed for handling sequential data, such as text or time series. In other words, transformer-based neural networks like LLMs
may be configured to perform natural language processing (NLP) tasks, such as questionanswering, machine translation, text summarization, and sentiment analysis. Language model module 342 may be configured to perform tasks such as classification, sentiment analysis, entity extraction, extractive question answering, summarization, re-writing text in a different style, ad copy generation, and concept ideation.
[0166] Transformer-based neural networks may utilize a self-attention mechanism, which allows the model to weigh the importance of different elements in a given input sequence relative to each other. The self-attention mechanism may help language model module 342 effectively capture long-range dependencies and complex relationships between elements, such as words in a sentence.
[0167] Language model module 342 may include an encoder and a decoder that operate to process and generate sequential data, such as structured text. Both the encoder and decoder may include one or more of self-attention mechanisms, position-wise feedforward networks, layer normalization, or residual connections. In some examples, the encoder may process an input sequence and create a representation that captures the relationships and context among the elements in the sequence. The decoder may then obtain the representation generated by the encoder and produce an output sequence. In some examples, the decoder may generate the output one element at a time (e.g., one word at a time), using a process called autoregressive decoding, where the previously generated elements are used as input to predict the next element in the sequence.
[0168] In some examples, if user intent is unclear, machine learning module 310 may be unable to determine the user’s intent with high confidence. In such instances, instructions file 350, which includes the set of instructions, may include instructions for prompting the user to clarify their input. In these examples, when instructions file 350 is executed, the program may be paused for these disambiguation steps. In some examples, the prompt may include a list of options, a map, a question, etc. For example, if a user provides an input such as “Book appointment for Jane,” the computing system may generate a widget for the “Family” GUI that includes a map with multiple pediatrician locations, and the user may be prompted to clarify which specific pediatrician they would like to book Jane’s appointment at. In some examples, after the user clarifies their intent, instructions file 350 may continue to execute based on the clarified intent. In some examples, responsive to receiving clarified intent, feedback, and/or additional user input, the computing system may update instructions file 350 accordingly.
[0169] In general, language model module 342 may apply an LLM to the indication of the natural language user input to identify one or more tasks, in which each task from the one or more tasks is associated with a respective category from one or more categories. In some examples, language model module 342 may apply an LLM to the indication of the natural language user input to identify the one or more categories. In some examples, language model module 342 may determine a set of information types included in the input (e.g., text or audio input or a transcription generated by speech-to-text module 226). An information type may be or otherwise include a topic, theme, point, subject, purpose, intent, keyword, etc. In some examples, language model module 342 may determine the information type by leveraging a self-attention mechanism to capture the relationships and dependencies between words in the input sequence. For example, language model module 342 may tokenize (e.g., split) a sequence of words or subwords, which language model module 342 may convert into vectors (e.g., numerical representations) that language model module 342 can process. Language model module 342 may use the self-attention mechanism to weigh the importance of each token in relation to the others. In this way, language model module 342 may identify patterns and relationships between the tokens, and in turn the words corresponding to the tokens, that indicate one or more information types of the accessibility information.
[0170] In general, language model module 342 may excel at performing NLP tasks, such as generating text and other content (e.g., new code that provides graphical components and functionality for performing one or more tasks). However, with respect to specific types of content (e.g., specific information types), language model module 342 may have an increased likelihood of generating false, inaccurate, or bad quality information. To address this issue, language model module 342 may be configured to exclude the generation of content or code relating to a set of excluded information types. For example, the set of excluded information types may include one or more of phone numbers, addresses, web addresses, functionality prohibited by an application, sensitive data (e.g., full bank account information), etc. Thus, input information may be passed in language model module 342 with certain prerequisites, prompts, or “rules” that can be stored in rules storage 344. Machine learning module 310 may apply these prerequisites, prompts, or rules when generating the set of instructions, or new code, associated with the functionality for performing the identified tasks and subtasks, and the corresponding GUIs and graphical components.
[0171] For example, machine learning module 310 may implement a rule such as, “Do not include user’s sensitive information” when generating instructions for generating a “Transfer Funds” widget that includes pre-populated input (e.g., instead of including a user’s full bank
account number, the pre-populated input may include a string such as, “Bank Account ending in 1234”). In some examples, machine learning module 210 may use accessibility information when generating code for GUIs and graphical components, such that the user can easily interact with the GUIs and graphical components. In some examples, the rules may be text inputs such as, for example, “Keep GUI headings short.” As such, rules storage 354 may store a plurality of text inputs and/or other data that further specify how instructions file 350 should be generated by machine learning module 310. For example, language model module 342 may be applied to the indication of the natural language user input in accordance with the one or more predefined rules stored in rules storage 344, which may include, for example, unauthorized terms, unauthorized class names, unauthorized dimensions of the graphical user interface, unauthorized application functionality, etc. Because language model module 342 can interpret the rules along with the input, the computing system may provide more accurate instructions for generating functionality and associated GUIs and graphical components for performing identified tasks. In this way, the computing system may be able to interpret natural language to understand user intents, and then write or generate new, robust, working code that satisfies the user intents, can perform calculations, and can render new graphical user interfaces or components at machine speed, etc.
[0172] While language model module 342 may be a transformer-based neural network in some examples, in some examples, language model module 342 may be or otherwise include one or more other types of neural networks. For example, language model module 342 may be or include an autoencoder. In some examples, the aim of an autoencoder is to learn a representation (e.g., a lower- dimensional encoding) for a set of data, typically for the purpose of dimensionality reduction. For example, in some examples, an autoencoder can seek to encode the input data and then provide output data that reconstructs the input data from the encoding. In some examples, the autoencoder can include additional losses beyond reconstructing the input data.
[0173] Generally, large language models can be slow and expensive in terms of carbon, energy usage, and financial cost. Thus, in some examples, machine learning module 310 may minimize how often language model module 342 is invoked by caching the generated set of instructions, or new code, in instructions cache 348. In general, language model module 342 may use a prompt including user intent (e.g., the output from speech-to-text module 226 of FIG. 2) and any contextual information received by the computing system. More specifically, in some examples, prior to generating the set of instructions, the computing system may perform “memory injection,” which may be considered a process in which an identified task
may be passed to a system that can look up and append additional context to the task. As an example, identified tasks such as “Send money to Mike” and “Book Jane’s appointment,” may be passed as input to machine learning module 310, in which machine learning module 310 may determine if any relevant context information has been stored by the computing system. In this example, machine learning module 310 may determine the following relevant context information: “The user has a husband called Mike” and “The user has a daughter aged 3 called Jane.” Machine learning module 310 may then include both the user input (e.g., the identified tasks) and the relevant context information in a prompt, which may then be used to generate the set of instructions. In general, it should be understood that machine learning module 310 may implement one or more self-prompting or recursive prompting models, e.g., language model module 342 may generate prompts based on retrieved application information, user input, context information, etc., which may involve generating follow-up questions, inferences, or further instructions that can guide subsequent stages of processing.
[0174] Furthermore, a prompt may include one or more APIs from API module 206, in which the one or more APIs may then be included in instructions file 350. As such, instructions file 350 may include instructions for gathering more specific details or data at runtime (e.g., one or more task APIs may send requests to the one or more applications at runtime). In this way, portions of generated code may be reused. Specifically, machine learning module 310 may be configured to perform instruction embedding in which a representation (i.e., embedding) of frequently used or critical instructions are stored in instructions cache 348. In various examples, instructions file 350 may be generated based on the instructions stored in instructions cache 348 and any additional instructions, information, or updates retrieved by an API at runtime that are not present in instructions cache 348. For example, if a user provides an input such as “reschedule today’s meeting,” language model module 342 may generate a general set of instructions for rescheduling any meeting on any day and store the instructions in instructions cache 348. If the user provides the same “reschedule today’s meeting” command in the future, language model module 342 may generate instructions file 350 including the cached instructions and an API call that retrieves, e.g., calendar application data pertaining to the future date. Thus, instructions file 350 may provide functionality for rescheduling a specific meeting on the future date.
[0175] By storing frequently used or critical instructions in instructions cache 348, machine learning module 310 may reuse the frequently used or critical instructions without having to invoke language model module 342 on data other than what is included in the prompt (e.g.,
language model module 342 may not have to re-apply the large language model to the information associated with the predefined functions included in the one or more applications). In some examples, the prompt may only include contextual information, and data indicative of user intent may be stored in instructions cache 348. In some examples, machine learning module 310 may apply code caching to both compiled and interpreted languages. Machine learning module 310 may implement various types of caching, such as, for example, Just-In-Time (JIT) compilation, Ahead-Of-Time (AOT) compilation, and bytecode caching.
[0176] As such, in general, machine learning module 310 may generate instructions file 350 using language model module 342, in which instructions file 350 may be generated based on one or more of application functionality, capabilities, and/or attributes included in retrieved application information, contextual information (e.g., user data), the natural language audio or text input received by the computing system, and/or the transcribed text output from a speech-to-text module. That is, using the information associated with the plurality of functions, a prompt may be generated by machine learning module 310, in which the prompt may specify output format, allowed data types, a UI component library that can be used to build end result UI, an API library including APIs that can be used to retrieve data from the applications at runtime, user input (e.g., the identified tasks), and context information. The prompt may then be provided to language model module 342 as input, in which language model module 342 may then generate instructions file 350 that includes code for accessing relevant device APIs and returning relevant UI components that provide functionality for performing tasks.
[0177] The prompt(s) used to generate instructions file 350 may be used by machine learning module 310 to determine whether a user’s desired application functionality or “new” application functionality is possible or within reason (e.g., a task widget may not be associated with functionality for transferring funds if the user does not have a banking application downloaded on their user device). In general, the set of instructions, and/or the “generated,” “desired,” or “new” functionality described herein may be defined as functionality or code that is dynamically generated by machine learning module 310 on the basis of the retrieved information associated with predefined application functionality, user input, and/or other information retrieved from a user computing device. In some examples, the at least one function for performing a respective task may include a combination of data and/or predefined application functionality retrieved from different applications. In general, the graphical components, e.g., task widgets, associated with the at least one function for
performing a respective task may provide a “shortcut” for completing the respective task. For example, instead of requiring a user to navigate through a messaging application to find out how much money the user needs to send to Mike, and then having to navigate through a banking application to search for Mike’s banking account username, initiate a new transfer, and manually enter in all relevant information, a single task widget may provide the user the functionality for performing all of the aforementioned actions, e.g., automatically, or with one or more simple clicks or interactions with the task widget.
[0178] In some examples, instructions file 350 may include the instructions for generating the at least one GUI associated with a respective category, in which the at least one GUI includes at least one graphical component associated with at least one function for performing a respective task. Instructions file 350 may also be stored in a memory of the computing system such that instructions file 350 can be resent, updated, or sent to a computing device associated with the user, one or more other computing devices associated with the user, and/or, in some examples, with explicit consent from the user, one or more other computing devices associated with one or more other users. As an example, in some examples, the computing system may receive, from a computing device, a request to send instructions file 350 to a companion device associated with the computing device, in which the computing system may then send, to the companion device, instructions file 350.
[0179] In some examples, instructions file 350 may include all data collected or used by the computing system to generate instructions file 350. For example, instructions file 350 may include details for how the user's natural language was resolved into working code. In some examples, users may be able to view or “inspect” instructions file 350. In other words, a user may be provided various controls to clarify, inspect, or stop a task to ensure that the computing system is following the user’s intent. Thus, the generated GUIs and/or graphical components may be inspectable, in which users can, for example, interact with widgets to see the associated data, code or instructions (e.g., instructions file 350), or pinch to expand widgets to reveal more controls. Furthermore, a user may be able to edit instructions file 350. For example, a user may edit the intent or parameters used by machine learning module 310, and instructions file 350 may be updated to reflect the edits. Furthermore, as described further below, users may interact with the GUIs and/or graphical components to add or delete GUIs and/or graphical components, directly edit parameters, edit the order of the GUIs, the positioning of the graphical components, change, add, or delete visual effects, etc. As such, any predetermined or suggested input determined by machine learning module 310, the functionality generated by machine learning module 310, the GUIs and graphical
components, and any other data included in instructions file 350 may be customizable or user-configurable.
[0180] In general, by leveraging language model module 342, the user interface generation provided by the computing system may require less time and/or effort to create new functionality and/or graphical user interfaces and components for performing a user’s identified tasks. That is, instead of users having to remember multiple tasks and navigate through multiple applications and user interfaces to access relevant information and functionality for performing their multiple tasks, the techniques of this disclosure may provide users the ability to quickly have their tasks organized into GUIs by simply providing natural language input. Furthermore, the organized GUIs may further provide users the ability to quickly perform their tasks, as doing so may only require a user to simply interact with a single widget.
[0181] FIG. 4 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure. Computing system 400 may be similar if not substantially similar to computing system 100 of FIG. 1 and computing system 200 of FIG. 2. Computing device 412 may be similar if not substantially similar to computing device 112 of FIG. 1. User interface (UI) components 402 may be similar if not substantially similar to UI components 102 of FIG. 1. Network 401 may be similar if not substantially similar to network 101 of FIG. 1. Furthermore, some or all of the techniques described with respect to computing system 400 may be implemented locally on computing device 412.
[0182] In the example of FIG. 4, with explicit consent from a user associated with computing device 412, computing system 400 may retrieve, using API module 406, information associated with predefined functions included in one or more applications executing at computing device 412. With explicit consent from the user, computing system 400 may also retrieve, using API module 406, other data and/or context information from computing device 412, such as historical user data, device data, user activity data, etc. In one example, UI module 404 may receive, from computing device 412, an indication of a natural language input such as “Send money to Mike, book Jane’s appointment, plan trip with John. . in which the input is associated with one or more predefined functions included in the one or more applications. For example, in this example, sending money, selecting a recipient for the money, booking an appointment, purchasing a flight, browsing the Internet, sending a message, selecting a recipient for the message, etc., may be examples of predefined
functionality for a banking application, a healthcare application, an airline application, a web browser application, and a messaging application executing on computing device 412.
[0183] In general, machine learning module 410 may apply a language model to the indication of the example natural language user input to identify one or more tasks, in which each task from the one or more tasks is associated with a respective category from one or more categories. For example, machine learning module 410 may identify a first task, “Send money to Mike,” a second task, “Book Jane’s appointment,” and a third task, “Plan trip with John.” Machine learning module 410 may determine, based on the retrieved information, data, and/or context information from computing device 412, that the first and second tasks are associated with a “Family” category. More specifically, in this example, data retrieved from computing device 412 may indicate that Mike and Jane are family members of the user, and that Jane is a child. Furthermore, a text message received from Mike R. that states, “Can you send me $20?”, and another text message received from another family member that states, “Can you book Jane’s doctor’s appointment for next week?” may further provide additional context information for determining user intent. For example, in this example, computing system 400 may determine, based on the text message, that the “Mike” referred to in the user’s input is specifically Mike R., and not Mike B., Mike C., or Mike L., as the text message was received from Mike R. As such, based on data retrieved from computing device 412, machine learning module 410 may determine that performing the first task requires functionality for sending $20 from the user’s preferred bank account to Mike R.’s banking account. Machine learning module 410 may determine that performing the second task requires functionality for booking an appointment for Jane at a local pediatrician next week at a time outside of the user’s scheduled meetings. Therefore, performing each task may require functionality from multiple different applications.
[0184] In general, UI generator module 408 may apply, using the information associated with the plurality of functions, machine learning module 410 to the first task and the second task to generate a set of instructions. The set of instructions may be considered dynamic and may be generated at runtime based on user input and retrieved information. Furthermore, the set of instructions may combine data retrieved from the one or more applications, such that a user may complete a task without having to navigate through the multiple associated applications. The set of instructions may provide at least one function for performing a respective task from the one or more tasks. Continuing the example above, the set of instructions may provide at least one function for sending $20 from the user’s preferred bank account to Mike
R.’s banking account, and at least one function for booking an appointment for Jane at a local pediatrician next week at a time outside of the user’s scheduled meetings.
[0185] In some examples, the at least one graphical user interface associated with the respective category includes one or more of at least one graphical component including text data associated with the respective category, at least one graphical component including text data associated with information from the one or more applications, at least one graphical component associated with one or more suggested inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task. In some examples, the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface.
[0186] For example, in the example of FIG. 4, the set of instructions may include instructions for generating GUI 417 associated with the “Family” category, which is demonstrated in FIG. 4 with text data 451 (“FAMILY”), which may be considered a GUI header. As further shown in the example of FIG. 4, GUI 417 associated with the “Family” category includes widget 452, titled “Send Money to Mike,” which is associated with the at least one function for sending $20 from the user’s preferred bank account to Mike R.’s banking account. Specifically, widget 452, as shown, includes pre populated text entry fields, such as “pay” text entry field 453 that is prepopulated with an input of “$20.00,” “from” text entry field 447 that is prepopulated with an input of “Checking Acct 1234,” and “to” text entry field 449 that is prepopulated with an input of “Mike R.” As further shown, widget 452 includes “Send” button 454, which may be configured to provide the generated functionality that sends $20 from the user’s bank account ending in 1234 to Mike R.’s banking account upon the user interacting with “Send” button 454.
[0187] As further shown in the example of FIG. 4, GUI 417 may include widget 455 titled “Book Jane’s Appointment,” which is associated with the at least one function for booking an appointment for Jane at a local pediatrician next week at a time outside of the user’s scheduled meetings. However, in some examples, such as this example, machine learning module 410 may not have enough data to determine the user’s intent with high confidence. Therefore, as shown in the example of FIG. 4, widget 455 includes text prompt 443 “Which pediatrician?”, map 445 showcasing the locations of pediatrician A, pediatrician B, and pediatrician C, and one or more suggested inputs, which are shown as buttons 456 that each
correspond to a specific pediatrician. As such, in this example, the set of instructions include instructions for prompting the user to clarify which suggested pediatrician they would like to book Jane’s appointment at. Responsive to receiving input indicative of a selection of a suggested input from the one or more suggested inputs (e.g., selecting the button that corresponds to Pediatrician A), computing system 400 may update widget 455.
[0188] In this way, rather than having to navigate through multiple applications to perform the tasks of sending money to Mike and booking an appointment for Jane, a user may simply say their intent for performing such tasks, and UI generator module 408 may generate instructions for generating GUI 417 that provides the user the ability to perform their tasks in a quick and organized manner, e.g., by simply interacting with widget 452 and widget 455. As such, the user may find it easier to complete tasks, and may enjoy an overall improved user experience.
[0189] FIG. 5 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure. Computing system 500 may be similar if not substantially similar to computing system 100 of FIG. 1, computing system 200 of FIG. 2, and computing system 400 of FIG. 4. Computing device 512 may be similar if not substantially similar to computing device 112 of FIG. 1 and computing device 412 of FIG. 4. User interface (UI) components 502 may be similar if not substantially similar to UI components 102 of FIG. 1 and UI components 402 of FIG. 4. Network 501 may be similar if not substantially similar to network 101 of FIG. 1 and network 401 of FIG. 4. Furthermore, some or all of the techniques described with respect to computing system 500 may be implemented locally on computing device 512.
[0190] FIG. 5 includes widget 557 that indicates the first task of sending $20 to Mike was completed (e.g., as shown, widget 557 may include a check mark), and widget 558 that indicates the second task of booking Jane’s appointment was completed (e.g., as shown, widget 558 may include a check mark). As such, in some examples, responsive to a user interacting with one or more graphical components associated with the at least one function for performing a respective task (that is, computing device 512 executes the at least one function for performing a respective task), computing system 500 may update the set of instructions to include instructions for generating updated graphical components, in which the updated graphical components indicate that the respective task was performed.
[0191] As further shown in the example of FIG. 5, “Family” GUI 517 may include widget 559 titled “Order Jersey,” which may be generated based on, for example, context information retrieved from computing device 512, such as a text message received that states, “Can you order a jersey for Jack?” In this example, the text message and other context information (e.g., historical user data, data retrieved from the one or more applications executing at computing device 512, etc.) may be used by computing system 500 to determine a task of ordering a jersey for Jack, in which computing system 500 may further determine that Jack is another family member of the user, a child, has a preference for a specific football team, wears a specific size, etc., and that the user has historically preferred to purchase similar items within a specific price range. As such, in some examples, with explicit consent from a user, computing system 500 may be configured to determine tasks that are not explicitly included in an indication of a natural language user input, but rather determined based on the context information retrieved from computing device 512. Furthermore, in some examples, while the set of instructions may include instructions for generating the category GUIs on a basis of a level of importance, the graphical components included in each GUI may also be generated on a basis of a level of importance or priority. For example, in this example, because the indication of the natural language user input explicitly included the tasks of sending $20 to Mike and booking an appointment for Jane, the widgets associated with each of those tasks may be displayed first, e.g., at a top portion of GUI 517, because those tasks may be assigned a higher level of priority. On the other hand, widget 559 associated with the task of ordering Jack a jersey, which is a task that was not explicitly included in the indication of the natural language user input, may be assigned a lower level of priority, and therefore may be displayed last, e.g., at a bottom portion of GUI 517. Furthermore, it should be noted that the example GUIs and/or graphical components described herein may be scrollable, such that, e.g., GUI 517 may include any number of graphical components corresponding to functionality for performing one or more tasks.
[0192] Similar to the tasks described above, machine learning module 510 may determine that the task of ordering Jack a jersey may require functionality from multiple different applications, such as functionality for browsing the web, functionality for determining Jack’s preferred size, functionality for completing a transaction, etc. Therefore, the set of instructions may provide at least one function for, e.g., purchasing a youth size small jersey for a specific football team that is within a price range of $5.00-$15.00. In general, parameters involved for performing a task may be pre-populated by UI generator module 508, e.g., based on context information, historical user data, application information, etc.
retrieved from computing device 512. As shown in the example of FIG. 5, widget 559 may display one or more suggested inputs, such as a suggested jersey for the specific football team chosen based on a suggested size parameter of “Youth Size S” and a suggested price parameter of “$9.99.” Widget 559 may further include “Buy” button 560, which a user may interact with to perform the task of ordering the jersey. Furthermore, as shown, widget 559 may include one or more user-configurable controls, such as “Price” slider 561, which a user may interact with to set a preferred price range. In some examples, responsive to the user interacting with slider 561 to set a different price range, and/or interacting with any other graphical component to change or set any other parameters, computing system 500 may update the set of instructions to include instructions for generating an updated widget 559 that includes functionality for performing the task with the updated parameters. Additionally, as shown, widget 559 may be scrollable, such that a user may swipe horizontally to discover other suggested jerseys for sale.
[0193] In this way, computing system 500 may provide users the ability to perform tasks, even when a user may not be “thinking of’ the tasks, e.g., may not explicitly say their intent for completing such tasks. Furthermore, in general, the graphical components, e.g., widgets, associated with the generated functionality for performing tasks may be positioned on a category GUI based on a level of priority, such that users may be presented with higher priority task widgets first, and may complete their tasks in a more organized manner.
[0194] FIG. 6 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure. Computing system 600 may be similar if not substantially similar to computing system 100 of FIG. 1, computing system 200 of FIG. 2, computing system 400 of FIG. 4, and computing system 500 of FIG. 5. Computing device 612 may be similar if not substantially similar to computing device 112 of FIG. 1, computing device 412 of FIG. 4, and computing device 512 of FIG. 5. User interface (UI) components 602 may be similar if not substantially similar to UI components 102 of FIG. 1, UI components 402 of FIG. 4, and UI components 502 of FIG. 5. Network 601 may be similar if not substantially similar to network 101 of FIG. 1, network 401 of FIG. 4, and network 501 of FIG. 5. Furthermore, some or all of the techniques described with respect to computing system 600 may be implemented locally on computing device 612.
[0195] As shown in the example of FIG. 6, in some examples, responsive to a user interacting with one or more user interface components to perform a task, computing system
600 may update the set of instructions to include instructions for generating updated graphical components that indicate a respective task was performed, display relevant information pertaining to the completed task, and/or provide functionality for performing the task again, e.g., with different parameters. For example, responsive to the user selecting button 456 of FIG. 4 that corresponds to Pediatrician A, computing system 600 may update the set of instructions to include instructions for generating widget 661 in place of widget 455. As shown, widget 661, titled “Reschedule Appointment,” may provide information pertaining to the completed task of booking Jane’s appointment, such as the date (e.g., “Friday Oct 15, 2024”), a summary of the event (“Jane at Doctor’ s”), time (“1 : 10 PM-l :30 PM”), and specific pediatrician (“Pediatrician A”) for which the appointment was booked. That is, widget 661 may provide a “reminder” to the user about Jane’s appointment, and, as shown, may be displayed at a top portion of GUI 617, as widget 661 may be assigned a higher level of priority than, for example, widget 659 (which may be similar if not substantially similar to widget 559 of FIG. 5). Furthermore, widget 661 may include “Call Doctor” button 662, which may be associated with at least one function for performing a task of rescheduling Jane’s doctor’s appointment. As such, in some examples, machine learning module 610 may intuitively determine additional actions or subtasks associated with a task that may be performed after the task is completed. In this way, users may not be required to explicitly provide input indicating their desire to perform such additional actions or subtasks; instead, computing system 600 may automatically determine the additional actions or subtasks, e.g., based on information retrieved from computing device 612, and automatically generate graphical components associated with functionality for completing the additional actions or subtasks.
[0196] In some examples, a user may provide one or more additional indications of a natural language input, from which machine learning module 610 may identify one or more tasks associated with and/or not associated with the one or more predetermined categories. As further shown in the example of FIG. 6, the user may provide an additional indication of a natural language input that includes the utterance, “How many days until the kids start school?” UI generator module 608 may apply machine learning module 610 to this additional indication and determine a task of determining how many days there are until the user’s children start their next school year, which machine learning module 610 may further determine to be associated with the “Family” category. UI generator module 608 may generate a set of instructions that include instructions for generating widget 663, titled “Days until school starts,” and may include functionality for counting down the days until the user’s
children start their next school year (e.g., based on information retrieved from a calendar application, etc.). As such, in some examples, some graphical components may not require a user to interact with the graphical components to perform a particular task. That is, in some examples, one or more graphical components may simply provide relevant information, reminders, notes, etc. that may answer a user’s query or intent.
[0197] FIG. 7A is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure. Computing system 700 may be similar if not substantially similar to computing system 100 of FIG. 1, computing system 200 of FIG. 2, computing system 400 of FIG. 4, computing system 500 of FIG. 5, and computing system 600 of FIG. 6. Computing device 712 may be similar if not substantially similar to computing device 112 of FIG. 1, computing device 412 of FIG. 4, computing device 512 of FIG. 5, and computing device 612 of FIG. 6. User interface (UI) components 702 may be similar if not substantially similar to UI components 102 of FIG. 1, UI components 402 of FIG. 4, UI components 502 of FIG. 5, and UI components 602 of FIG. 6. Network 701 may be similar if not substantially similar to network 101 of FIG. 1, network 401 of FIG. 4, network 501 of FIG. 5, and network 601 of FIG. 6. Furthermore, some or all of the techniques described with respect to computing system 700 may be implemented locally on computing device 712.
[0198] In one example, the indication of a natural language user input may include an identified task such as, “Plan the trip with John,” which may involve multiple actions or subtasks to complete. In the example of FIG. 7 A, machine learning module 710 may determine, based on the indication of the natural language input, a “Trip” category, in which the set of instructions may include instructions for generating GUI 766 associated with the “Trip” category (demonstrated in FIG. 7A with text data 765 (“TRIP”), which may be considered a GUI header). One example subtask determined by machine learning module 710 may be a subtask of booking an accommodation. As shown in the example of FIG. 7 A, GUI 766 may include widget 767A, titled “Book Accommodation,” which may include one or more suggested accommodations based on one or more suggested input parameters. In some examples, the at least one graphical component (e.g., widget 767A) generated by the set of instructions includes a first graphical component and a second graphical component, in which the first graphical component is associated with a first function for performing a respective task, and the second graphical component is associated with a second function for performing the respective task. For example, as shown, widget 767A may include additional graphical
components, e.g., “sub -widgets,” such as sub-widget 768 titled, “Budget” that includes slider 769, and sub-widget 770 titled, “Distance” that includes draggable circle (i.e., radius selector) 771, in which slider 769 and draggable circle 771 may be considered user-configurable controls that provide functionality for tasks such as setting a price range and setting a location radius. In some examples, based on information retrieved from computing device 712, subwidget 768 may display a suggested price range, and sub-widget 770 may display a suggested location and search radius, in which the suggested parameters may result in one or more suggested graphical components. For example, the suggested parameters resulted in suggested sub-widget 781 corresponding to a suggested “Cottage” accommodation, suggested sub-widget 782 corresponding to a suggested “Hotel” accommodation, and suggested subwidget 783 corresponding to a suggested “Family Home” accommodation. In general, the user may interact with slider 769 to adjust the price range parameters, and may interact with draggable circle 771 to adjust the location and radius for the search, in which computing system 700 may then update the set of instructions to include instructions for generating an updated widget 767B (shown in FIG. 7B) that displays suggestions based on the updated parameters.
[0199] FIG. 7B is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure. As shown in the example of FIG. 7B, widget 767B includes sub-widget 772 titled, “Vibes,” which further includes at least one keyword, such as “Quiet” keyword 773, “Unique” keyword 784, and “Cozy” keyword 785, and at least one user-configurable control, such as draggable circles 774 that each correspond to a keyword. In the example of FIG. 7B, the suggested sub-widgets 781, 782, and 783 each corresponding to a suggested accommodation may be based on one or more of the at least one keyword and at least one user-configurable control. That is, as shown in the example of FIG. 7B, the suggested subwidgets 781, 782, and 783 each corresponding to a suggested accommodation may be based on the user-configurable budget parameter set by sub-widget 768 and user-configurable distance parameter set by sub-widget 770 (which in this example, may be collapsible/expandable widgets), keywords 773, 784, and 785, and user-configurable controls corresponding to a keyword, such as draggable circles 774. In some examples, sub-widget 772 may be considered a “tension triangle” configured to alter parameters based on user interaction with the triangle. For example, in some examples, computing system 700 may receive an indication of a user input associated with one or more of the at least one keyword
and at least one user-configurable control, e.g., the user may interact with a draggable circle 774 to slide the draggable circle 774 closer or farther away to corresponding “Quiet” keyword 773. In this example, a user may indicate a level of importance for each keyword based on the distance from the keyword in which user-configurable control (e.g., the draggable circle) is set. For example, if a user prioritizes accommodations that are described as “Quiet,” the user may interact with a draggable circle 774 to slide the draggable circle 774 closer to corresponding “Quiet” keyword 773. As such, computing system 600 may receive the indication of this user input, and update, based on the indication of the user input, the at least one suggested graphical component. That is, computing system 600 may update the set of instructions to include instructions for generating one or more suggested graphical components that each correspond to an updated suggestion. In this example, suggested subwidgets 781, 782, and 783 each corresponding to a suggested accommodation may be removed, replaced and/or updated based on the updated level of importance assigned to keywords 773, 784, and 785. Furthermore, in some examples, the user may edit the keywords, e.g., by tapping on the graphical component associated with a keyword to change the keyword to a different keyword, by typing a different keyword, and/or providing additional natural language input such as, e.g., “Find me accommodations that are modern, minimalist, and spacious.” Computing system 600 may then replace suggested sub-widgets 781, 782, and 783 with suggested sub-widgets that each correspond to a new suggested accommodation determined to be associated with the new keywords.
[0200] As such, in general, while the GUIs and graphical components generated from the set of instructions may be associated with generated functionality that can help complete a user’s tasks in a “shortcut” manner, the GUIs and graphical components can be fine-tuned and customized by the user, in that the user can quickly and easily change suggested input parameters for completing their tasks. Furthermore, the design and layout of the category GUIs and graphical components (e.g., widgets) may present relevant task information and actions in a more concise, less busy, and more organized manner, which may reduce the overwhelmingness of performing tasks.
[0201] FIG. 8 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality for performing tasks associated with a category, in accordance with one or more techniques of this disclosure. Computing system 800 may be similar if not substantially similar to computing system 100 of FIG. 1, computing system 200 of FIG. 2, computing system 400 of FIG. 4, computing system 500 of FIG. 5, computing system 600 of FIG. 6 and computing
system 700 of FIG. 7. Computing device 812 may be similar if not substantially similar to computing device 112 of FIG. 1, computing device 412 of FIG. 4, computing device 512 of FIG. 5, computing device 612 of FIG. 6, and computing device 712 of FIG. 7. User interface (UI) components 802 may be similar if not substantially similar to UI components 102 of FIG. 1, UI components 402 of FIG. 4, UI components 502 of FIG. 5, UI components 602 of FIG. 6, and UI components 702 of FIG. 7. Network 801 may be similar if not substantially similar to network 101 of FIG. 1, network 401 of FIG. 4, network 501 of FIG. 5, network 601 of FIG. 6, and network 701 of FIG. 7. Furthermore, some or all of the techniques described with respect to computing system 800 may be implemented locally on computing device 812. [0202] In some examples, computing system 800 may receive one or more of an additional indication of a user input and context information from the one or more applications, in which computing system 800 may update, based on one or more of the additional indication of a user input and the context information, the at least one graphical user interface. For example, in the example of FIG. 8, UI components 802 may display GUI 866 (which may be similar if not substantially similar to GUI 766 of FIGS. 7A and 7B), in which GUI 866 includes widget 876 that displays a message from John (“JR”) saying “I’ve bought our flights.” That is, in some examples, the set of instructions including instructions for generating GUI 866 and one or more graphical components may be updated based on new context information retrieved from computing device 812. For example, computing system 800 may retrieve information related to the message received from John R., and may generate instructions for generating widget 876. Furthermore, computing system 800 may generate widget 877 titled, “Days until trip,” which may display the number of days until the user’s trip starts. As shown, responsive to the user interacting with a suggested graphical component to select a suggestion and perform a task, e.g., responsive to a user selecting suggested subwidget 781 of FIG. 7A to perform the task of booking the suggested cottage accommodation, computing system 800 may update the set of instructions to include instructions for generating widget 878, titled “Cottage Booked,” which may replace widget 767A of FIG 7A and/or widget 767B of FIG. 7B. As shown, widget 878 may provide information pertaining to the completed task of booking the cottage accommodation, such as the dates (e.g., “Nov 1st - Nov 7th”), the address of the accommodation (“123 Fifth Street”), and other relevant information, such as the check-in time (“Check-In at 12 PM”). That is, widget 878 may provide a “reminder” to the user about the booked accommodation, and, as shown, may include “Message Host” button 879, which may be associated with at least one function for performing a task of messaging the host of the cottage.
[0203] Furthermore, as shown in the example of FIG. 8, in some examples, GUI 866 may include text summary 875 generated by machine learning module 810, which may provide a summary of the tasks and/or any retrieved information associated with the category, such as tasks or subtasks that have been completed, tasks or subtasks that have not been completed, relevant information retrieved from computing device 812, etc. For example, text summary 875 includes the sentences, “Flights and accommodations have been booked. John has indicated the sights he would like to see,” in which “the sights he would like to see” is underlined to demonstrate an embedded hyperlink or deep link generated based on the information retrieved from, e.g., a messaging application that includes relevant messages from John R. As such, in general, with explicit user consent, computing system 800 may continuously or periodically retrieve information from computing device 812, such as existing functionality and context information (e.g., received messages, notifications, etc.) from applications executing on computing device 812, and generate the set of instructions to include instructions for generating new or updated GUIs and graphical components that are based on the retrieved information. In this way, users may collaborate with other users to complete tasks, and computing system 800 may prevent users from performing subtasks that have already been completed (e.g., computing system 800 may prevent the user from purchasing airline tickets when John has already purchased them).
[0204] In some examples, computing system 800 may generate suggested graphical components associated with functionality for performing subtasks that computing system 800 determines to be relevant to a larger task. For example, in the example of FIG. 8, computing system 800 may generate a set of instructions that includes instructions for generating suggested widget 880 titled “Restaurants in the area,” which may be associated with functionality for booking a dinner reservation at a restaurant in a location that is close to the address for the booked cottage accommodation. As shown, suggested widget 880 may be “grayed-out,” that is, suggested widget 880 may be a suggested widget that computing system 800 has determined to be associated with a lower level of priority. Suggested widget 880 may be displayed at a bottom portion of GUI 866, and the functionality of suggested widget 880 may only be implemented responsive to a user interacting with suggested widget 880 so as to “accept” the suggestion (e.g., the user may click on suggested widget 880 to activate or enable suggested widget 880). As shown, suggested widget 880 may be a scrollable widget that includes one or more suggested sub-widgets each corresponding to a suggested restaurant, in which the one or more suggested restaurants may be determined by computing system 800 based on information retrieved from computing device 812 (e.g.,
historical user data or preferences that indicate a user’s preferred genre of food, restaurant rating, budget, etc.).
[0205] As such, in general, users may simply provide natural language input that includes various user intents, computing system 800 may identify tasks based on the user intents, and computing system 800 may generate instructions for generating categorized GUIs, in which the categorized GUIs include organized task widgets that are prepopulated with suggested input and provide functionality for performing the identified tasks. Furthermore, computing system 800 may utilize information retrieved from computing device 812 to update, finetune, and intuitively determine the GUIs and task widgets for completing a user’s tasks. In this way, the techniques described herein may reduce the mental load, complexity, and time required for users to complete various tasks, and therefore may provide an overall improved user experience when operating user devices.
[0206] FIG. 9 is a conceptual diagram illustrating another example computing system for sending an output including a graphical user interface and application functionality to a companion device, in accordance with one or more techniques of this disclosure.
[0207] In the example of FIG. 9, a user 920 interacts with computing device 912 that is in communication with computing system 900. In some examples, some or all of the components and/or functionality attributed to computing system 900 may be implemented or performed by computing device 912. Computing system 900 may be similar if not substantially similar to computing system 100 of FIG. 1 and computing system 200 of FIG. 2. Computing device 912 may be similar if not substantially similar to computing device 912 of FIG. 1. GUI 916 may be similar if not substantially similar to GUI 116 of FIG. 1. User interface (UI) components 902 may be similar if not substantially similar to UI components 102 of FIG. 1 and UI components 202 of FIG. 2. Network 901 may be similar if not substantially similar to network 901 of FIG. 1. FIG. 9 further includes a companion device 981 including UI components 982. As shown in the example of FIG. 9, companion device 981 is in communication with computing system 900 and computing device 912 via network 901. GUI 916 may include a number of application widgets, such as application widgets 9I5F-915I.
[0208] In some examples, the techniques described herein may also provide users a “shortcut” to desired application functionality for a single application. Computing system 900 may be configured to generate instructions for performing an action that can be performed via, for example, a touch and talk feature (described in more detail below), rather than by the user navigating through the application. For example, a user may provide an input such as
“Put a star emoji next to Thursday’s meeting,” while holding down on a calendar app widget. Many applications and GUIs executed on computing devices are often limited by design space. That is, in the instance of the calendar application, multiple screens and/or user interface components may be required to provide all of the application’s functionality. While the user is able to perform this specific action themselves within the calendar application, computing system 900 may generate instructions for performing this action automatically on behalf of the user. In this way, users may not be required to navigate through applications to find their desired functionality or perform specific actions. Instead, users may simply interact with a single application widget and access their desired functionality and/or have their desired actions performed immediately. That is, the techniques described herein may provide user 920 with a mechanism to “shortcut” the complexity of various actions.
[0209] In the example of FIG. 9, with explicit consent from user 920, computing system 900 may retrieve, using API module 906, a first set of instructions (e.g., API response data) associated with a first plurality of functions included in an application. In general, the “first plurality of functions” may be functions, or “functionality”, e.g., capabilities or features of an application, that are provided by the values, settings, or other data that are directly embedded into the source code of an application, rather than those that are dynamically generated or configurable at runtime. The “first plurality of functions” may include functionality provided by values, logic, etc. that are fixed, e.g., “hard-coded”, in an application’s source code, and cannot be easily changed without modifying the code itself. As such, the “first plurality of functions” may be considered statically defined functions, or functions that are predefined at compile time or build time and do not change during execution. The “first set of instructions associated with a plurality of functions” described herein may refer to information, data, etc. that can be retrieved, e.g., via an API, from one or more applications installed on a computing device, such as computing device 912.
[0210] Computing system 900 may also receive, from computing device 912, and provided that user 920 has given explicit consent, an indication of a natural language user input (e.g., audio or text input from user 920) associated with one or more functions from the first plurality of functions included in the application. In general, the indication of a natural language user input may represent user 920’ s command or desired functionality for the application. Using the retrieved first set of instructions, computing system 900 may apply machine learning module 910 to the received natural language user input to generate a second set of instructions (e.g., code), that includes instructions for generating a corresponding GUI, graphical component, and/or the user’s desired functionality for the application.
[0211] In general, the “second set of instructions” may be dynamically generated at runtime based on user input and retrieved information, including data associated with the predefined or statically defined functions, capabilities, or features from the one or more applications. That is, the second set of instructions may be associated with one or more functions from a second plurality of functions. The “second plurality of functions” may be considered dynamically generated or configurable functions that may adapt or change based on input data and/or other conditions at runtime. In some examples, the “second plurality of functions” may be considered to be “included in” one or more applications, in that the second plurality of functions may be based on the first plurality of functions, and are determined to be possible functions for the one or more applications (e.g., the second plurality of functions may not include functions for performing a funds transfer if no banking applications are installed.). The second set of instructions may be considered dynamically generated code that provides corresponding GUIs, GUI components, and/or application functionality based on user input.
[0212] As such, even if user 920’ s desired functionality for the application is not statically defined or predefined at compile time or build time, computing system 900 may generate new code that provides user 920’ s desired functionality, so long as the desired functionality is determined to be a possible functionality for the application (e.g., machine learning module 910 may determine whether the desired functionality is reasonable for the application, and/or computing system 900 may determine whether an API request can return information required for the desired functionality). Computing system 900 may then send the second set of instructions to computing device 912, in which the computing device may then use the second set of instructions to generate a corresponding GUI component (e.g., a widget) on GUI 916 as well as provide the user’s desired functionality for the application. The corresponding GUI component (e.g., a widget) may be or include at least one graphical component associated with the one or more functions from the second plurality of functions. That is, the corresponding GUI component (e.g., widget) may include graphical components and/or graphical elements that are associated with or provide user 920’ s desired functionality. [0213] As an example, computing system 900 may retrieve, using API module 906, a first set of instructions associated with a first plurality of functions included in a weather application (e.g., represented by widget 915F) executing at computing device 912. Computing system 900 may then receive an indication of a natural language user input that is associated with one or more functions from the first plurality of functions included in the weather application (e.g., user 920 may provide a voice input to computing device 912 such as “Current
temperature” when interacting with (e.g., pressing down on) widget 915F). Computing system 900 may then apply, using the first set of instructions, machine learning module 910 to the “Current temperature” input to generate a second set of instructions associated with one or more functions from a second plurality of functions (e.g., functions that are not statically defined or predefined at compile time or build time for the weather application, such as functions for displaying information via a new GUI or graphical component). The second set of instructions may include instructions for generating a GUI or a GUI component to display the current temperature, such as widget 984. Computing system 900 may then send, to computing device 912, the second set of instructions. As shown in the example of FIG. 9, widget 984 may be displayed on GUI 916 and provide the current temperature to user 920 without user 920 having to navigate through the larger weather application. As another example, widget 983 may have been generated based on the predefined functions in a banking application represented by widget 915G and user 920 providing an input such as “Show checking account balance.” In this example, computing system 900 generated instructions for generating widget 983 and providing user 920’ s desired functionality. As another example, widget 985 may have been generated based on the predefined functions in an Internet browser application represented by widget 915H and user 920 providing an input such as “Convert cups to milliliters.” In this example, computing system 900 generated instructions for generating widget 985 and providing user 920’ s desired functionality. As shown in the example of FIG. 9, new widgets 983, 984, and 985 may be saved or presented on GUI 916 of computing device 912 for future use.
[0214] In some examples, computing system 900 may generate instructions for a new GUI that includes at least one graphical component associated with the one or more functions from the second plurality of functions. That is, in some examples, new widgets 983, 984, and 985 may be saved or presented on a new GUI that is different from GUI 916.
[0215] In some examples, computing system 900 generates instructions for a GUI component (e.g., a widget) associated with one or more suggested natural language user inputs. In these examples, the one or more suggested natural language user inputs may be based on one or more historical natural language user inputs. For example, in the example of FIG. 9, the natural language user input may be provided via a “touch and talk” feature. For example, user 920 may hold down on widget 915G (which represents a banking application) with their finger, which is a gesture that may correspond to a user interface component 902 (e.g., a microphone) of computing device 912. While holding down on widget 915G, and without opening the banking application, user 920 may also be presented with GUI component 993,
which may be considered a “pop-up widget” that displays suggested inputs or commands for the banking application. A pop-up widget is typically designed to be temporary and overlay the existing content on a screen. As such, when user 920 releases their finger from widget 915G, GUI component 993 may disappear. As shown in the example of FIG. 9, though, GUI component 993 includes suggested input “Send $20 to Jane. . .” The suggested input or inputs provided by GUI component 993 may be based on one or more capabilities provided by the application (e.g., transferring funds from one bank account to another, paying a bill, etc.). In other words, the suggested input may represent one or more functions from the first plurality of functions included in an application. In some examples, the suggested input or inputs may be based on user 920 providing such inputs previously (i.e., based on one or more historical natural language user inputs) when executing or interacting with the banking application. In some examples, the suggested input may be based on actions frequently performed by a user. For example, if a user frequently navigates through multiple screens to check their account balance, the suggested input may include an input that results in the generation of widget 983. In some other examples, computing system 900 may generate a second GUI that includes at least one graphical component associated with one or more suggested natural language user inputs. For example, instead of a pop-up widget, computing system 900 may generate instructions for an overlay GUI that displays one or more suggested natural language user inputs.
[0216] If user 920 provides an input such as “Send $20 to Jane” while holding down on widget 915G with their finger, a microphone or other UI component 102 on computing device 912 may capture the input and send, to computing system 900, the input as the indication of the natural language user input. In this way, computing system 900 may receive the indication of the natural language user input with context information, e.g., the input received by computing system 900 may further indicate that user 920 was holding down on the banking application widget 915G. As such, computing system 900 may more accurately determine user 920’ s intent of transferring funds from their bank account.
[0217] In one way, the “touch and talk” feature may “decouple” an application’s user interface from the application’s functionality and capabilities. In other words, when a user holds down on widget 915G (which represents the banking application) with their finger, they may provide an input associated with a desired application function or capability that is not statically defined in the application’s source code or predefined at compile time or build time. Thus, a user is not restricted to providing inputs associated with functionality and capabilities already provided by application developers, and users can customize applications so long as
their desired functions or capabilities are within reason (e.g., as determined by computing system 900) or adhere to a set of rules pertaining to the application (e.g., while a user may request a new GUI and/or graphical component to be generated from the banking application that provides functionality for transferring funds to a contact, the user may not be able to request a new GUI and/or graphical component to be generated from the banking application that provides functionality for making a phone call).
[0218] Various aspects of the techniques described in this disclosure may facilitate better user experience with applications executing on user devices. For example, rather than a user having to navigate through multiple user interfaces in an application to access their desired information or functionality, a user may simply touch the main widget for the application, say their intent or command, and the computing device may provide a new widget that displays the desired information and/or provides the desired functionality. As such, the techniques described may provide more assistance to users when interacting with devices and applications, and may improve overall user experience when interacting with devices and applications. Furthermore, provided that the techniques described include generating new code based on user intent, users may be able to personalize the functionality of applications with which they interact without requiring a developer of the application to actually add features or otherwise update the application.
[0219] In general, API module 906 may retrieve a first set of instructions (e.g., API response data, etc.) from an application executing on computing device 912, which user interface generator module 908 may interpret in order to understand the functionality provided by the application. Interface generator module 908 may further use the first set of instructions and other device information (e.g., user interaction information) to contextualize the indication of a natural language user input when applying machine learning module 910. For example, continuing with the banking example above, interface generator module 908 may receive the “Send $20 to Jane. . .” input string in addition to a first set of instructions associated with the funds transfer capabilities included in the banking application. As such, machine learning module 910 may receive more context for the user input and thus more accurately interpret the user input.
[0220] In some examples, with explicit consent from user 920, computing system 900 may determine which accessibility actions are frequently performed by user 920 when interacting with a GUI or application such that the new GUIs and/or GUI components generated by user interface generator module 908 can be better tailored for user 920’ s needs. For example, in the case where user 920 is unable to provide a text input, user interface generator module 908
may generate instructions for a GUI such as widget 985 that provides user 920 the functionality of widget 985 when user 920 provides a voice command such as “Convert 12 cups to milliliters.”
[0221] Additionally, in some examples, API module 906 may retrieve information pertaining to every element included in a GUI, such as GUI 916, no matter the type of element. In some examples, the first set of instructions may include accessibility information. In some examples, the accessibility information may be associated with a “view hierarchy” of a GUI of the application executing at the computing device, wherein the GUI may be represented as a tree of GUI views. In some examples, this hierarchy may demonstrate a hierarchy of information presented via a GUI, such as a category, subcategory, and sub-subcategory. In some examples, the first set of instructions may include information associated with a plurality of user interface elements included in the application. In some examples, computing system 900 may retrieve information associated with the plurality of user interface elements included in GUI 916 via API module 906, wherein the information includes one or more of a node type, textual content associated with a node, an action that can be performed on a node, a relationship between one or more nodes, or a plurality of accessibility features included in a node. In these examples, interface generator module 908 may use this information to determine the format, size, color scheme, accessibility features, or any other features to include in the second set of instructions (e.g., new code) for generating a new GUI component (e.g., new customized widget) and functionality for an application.
[0222] In some examples, computing system 900 may also provide users the ability to configure various accessibility and/or display options according to their needs. For example, a user may be able to adjust the user interface components of GUI 916, such as text size, enable color correction, set up magnification gestures, and configure gesture-based navigation for GUI 916.
[0223] In some examples, a user may also edit or update the desired functionality and new GUI and/or GUI component for an application. In other words, computing system 900 may receive an updated natural language user input (e.g., a user may provide a voice command such as “Show three most recent transactions instead” to edit widget 983). Computing system 900 may then apply machine learning module 910 to the updated natural language user input to update the second set of instructions (e.g., to update instructions file 350 of FIG. 3C), wherein the second set of instructions then includes instructions for generating an updated GUI component (e.g., an updated widget 983 that shows the user’s three most recent transactions instead of their checking balance). Computing system may then send the updated
second set of instructions to computing device 912 to display the updated GUI component and functionality to the user via GUI 916. In some other examples, the updated second set of instructions may include instructions for generating an updated GUI (e.g., in examples in which a new GUI was generated based on the natural language user input). Additionally, as described above, in some examples, the user may be prompted to clarify their intent if the natural language user input is unclear. In these examples, computing system 900 may generate an intermediate set of instructions for generating a GUI component or prompt with which the user may interact to clarify their intent or input. Responsive to receiving the clarified natural language user input, computing system 900 may then generate the updated set of instructions.
[0224] In some examples, the second set of instructions for generating the new GUI component and application functionality may be shared between users. In other words, computing system 900 may receive, from computing device 912, a request to send the second set of instructions to companion device 981 that is associated with computing device 912. Computing system 900 may then send the second set of instructions to companion device 981 to display the new GUI component and functionality to another user via UI components 982. In this way, users may share GUIs and widgets, such as widgets 983, 984, 985, and/or other GUIs and widgets described herein , and new application functionality with each other. In some examples, a first user operating computing device 912 may send a widget to a second user operating companion device 981 via, for example, Short Message Service (SMS). For example, the first user may copy and paste widget 984 into a text message that is then sent to the second user, in which the second user may copy and paste widget 984 onto a home screen of companion device 981.
[0225] As described above, the techniques described herein may all be implemented locally on a computing device, such as computing device 912. For example, computing device 912 may retrieve, using an API, a first set of instructions associated with a first plurality of functions included in an application executing at computing device 912, such as a banking application represented by widget 915G. Computing device 912 may then receive, from a user operating computing device 912 and via UI components 902, an indication of a natural language user input associated with one or more functions from the first plurality of functions included in the application. Computing device 912 may then apply, using the first set of instructions, a machine learning model, such as a large language model, to the indication of the natural language user input to generate a second set of instructions associated with one or more functions from a second plurality of functions (e.g., a user’s desired functions), wherein
the second set of instructions includes instructions for generating a GUI component (e.g., widget 983) that may provide the user’s desired functionality. For example, with respect to FIG. 9, computing device 912 may generate a second set of instructions (e.g., instructions file 350 of FIG. 3C) including instructions for generating a user’s desired application functionality and/or an associated GUI component on GUI 916. For example, responsive to a user interacting with banking application widget 915G (e.g., by holding their finger down on widget 915G) and providing an indication of a natural language user input such as “Show checking account balance. . .” at computing device 912 (e.g., via UI components 902), computing device 912 may then generate a set of instructions for generating widget 983 on GUI 916, using the techniques described above, which provides the user the ability to view their checking account balance without having to navigate through the banking application. [0226] FIG. 10 is a flowchart illustrating an example operation for dynamically generating custom graphical user interfaces for one or more applications, in accordance with one or more techniques of this disclosure. The example of FIG. 10 is described with respect to FIGS. 1-8.
[0227] Computing system 100 retrieves information associated with a plurality of functions included in one or more applications (1086). In some examples, computing system 100 also retrieves one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with one or more graphical user interfaces. In some examples, computing system 100 stores the retrieved information in instructions storage 222. Computing system 100 receives an indication of a natural language user input associated with the plurality of functions included in the one or more applications (1087). In some examples, the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display.
[0228] In some examples, computing system 100 applies speech-to-text module 226 to convert audio data indicative of the natural language user input to text data. In some examples, computing system 100 applies machine learning module 310 to the indication of the natural language input or the text data to identify one or more tasks, in which each task from the one or more tasks is associated with a respective category from one or more categories (1088). In some examples, computing system 100 applies machine learning module 310 to the indication of the natural language input or the text data to identify the one or more categories. In some examples, computing system 100 applies language model module 342 including a large language model to the indication of the natural language input or the text data to identify the one or more tasks and/or the one or more categories.
[0229] Computing system 100 applies, using the information associated with the plurality of functions and/or other information stored in instructions storage 222, machine learning module 310 to the one or more tasks to generate instructions file 350 including a set of instructions (1089). In some examples, the set of instructions provides at least one function for performing a respective task from the one or more tasks. In some examples, instructions file 350 includes instructions for generating at least one GUI associated with the respective category, in which the at least one GUI associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task. For example, instructions file 350 may include instructions for generating GUI 417 associated with a “Family” category (demonstrated by “FAMILY” header 451), and GUI 766 associated with a “Trip” category (demonstrated by “TRIP” header 765). GUI 417 may include widget 452 associated with at least one function for performing an identified task, and widget 455 associated with at least one function for performing another identified task, in which both tasks are associated with the “Family” category. GUI 766 may include widget 767A associated with at least one function for performing an identified task associated with the “Trip” category.
[0230] In some examples, computing system 100 receives one or more of an additional indication of a user input and context information from the one or more applications, and updates, based on one or more of the additional indication of a user input and the context information, the at least one GUI. As an example, computing system 100 may receive additional indication of a user input 664 that includes the query, “How many days until the kids start school?” Computing system 100 may also receive context information, e.g., from a calendar application. Based on additional indication of a user input 664 and the context information, computing system 100 may update GUI 617 to include widget 663 that is associated with functionality for counting the days until the user’s children start the next school year.
[0231] In some examples, a GUI associated with a respective category includes one or more of at least one graphical component including text data associated with the respective category, at least one graphical component including text data associated with information from the one or more applications, at least one graphical component associated with one or more suggested inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task. For example, GUI 866 may include text header 865 “TRIP” associated with the “Trip” category, and GUI 866 may further include text summary 875 associated with information from the one or more applications, in which
text summary 875 provides a short summary of the subtasks and information relevant to a task of, e.g., planning a trip. As another example, GUI 766 may include sub-widgets 781, 782, and 783, which may each be associated with one or more suggested inputs. As another example, GUI 866 may include suggested widget 880, which may be associated with at least one function for performing a task or subtask such as, e.g., booking a dinner reservation for the trip.
[0232] In some examples, the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of the indication of the natural language user input, historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface. For example, text header 865 “TRIP” associated with the “Trip” category may be based on an indication of a natural language user input such as, “Plan the trip with John.” As another example, sub-widgets 781, 782, and 783, which may each be associated with one or more suggested inputs, may be based on historical natural language user inputs, context information from the one or more applications, user data, and/or information associated with GUI 766 that indicates, e.g., a user’s preferences. As another example, text summary 875 may be based on, e.g., context information retrieved from a messaging application that indicates John booked airline tickets, and information associated with GUI 866, such as widget 878 that indicates the trip accommodation has been booked.
[0233] In some examples, the at least one GUI associated with the respective category includes the at least one graphical component associated with the one or more suggested inputs. In these examples, responsive to receiving input indicative of a selection of a suggested input from the one or more suggested inputs, computing system 100 updates the at least one graphical component associated with the at least one function for performing the respective task. For example, GUI 417 may include widget 455 including buttons 456 that each correspond to a suggested pediatrician. Responsive to a user selecting a button 456 that corresponds to “Pediatrician A,” computing system 100 updates widget 455 associated with the at least one function for performing the task of booking Jane’s appointment at pediatrician A, in which widget 455 may then be updated or replaced as widget 558 or widget 661.
[0234] In some examples, the at least one GUI associated with the respective category includes the at least one suggested graphical component, in which the at least one suggested graphical component is based on one or more of at least one keyword and at least one user- configurable control. For example, GUI 766 associated with the “Trip” category may include
sub-widgets 781, 782, and 783, which may each be associated with one or more suggested inputs, and may be based on one or more of keywords 773, 785, and 784, draggable circles 774 each corresponding to a respective keyword, draggable circle 771, and slider 769. In some examples, computing system 100 receives an indication of a user input associated with one or more of the at least one keyword and at least one user-configurable control, and updates, based on the indication of the user input, the at least one suggested graphical component. For example, computing system 100 may receive an indication that a user interacted with one of draggable circles 774 to assign a higher level of importance to “Quiet” keyword 773, in which computing system 100 may update suggested sub-widgets 781, 782, and 783 to include one or more updated suggestions that are determined by computing system 100 to be more associated with “Quiet” keyword 773.
[0235] In some examples, the one or more applications are one or more applications executing at computing device 112, in which computing system 100 sends, to computing device 112, instructions file 350. In some examples, some or all of the techniques described herein with respect to the computing system may be implemented on or performed by the computing device, and vice versa. In some examples, computing system 100 receives, from computing device 112, a request to send instructions file 350 to a companion device associated with computing device 112, and computing system 100 sends, to the companion device, instructions file 350.
[0236] In some examples, the at least one graphical component includes a first graphical component and a second graphical component, in which the first graphical component is associated with a first function for performing the respective task, and the second graphical component is associated with a second function for performing the respective task. For example, widget 767A may be associated with at least one function for performing a task of booking an accommodation. Widget 767A may further include sub-widget 768 which may be associated with a function for setting a budget for booking the accommodation, and subwidget 770 which may be associated with a function for setting a preferred location for booking the accommodation.
[0237] FIG. 11 is a flowchart illustrating another example operation for dynamically generating custom graphical user interfaces for one or more applications, in accordance with one or more techniques of this disclosure. The example of FIG. 11 is described with respect to FIGS. 1-9.
[0238] Computing system 900 retrieves, using API module 906, information associated with a plurality of functions included in one or more applications (1190). In some examples, the
one or more applications are executing at computing device 912, such as an application associated with widget 915G. Computing system 900 receives an indication of a natural language user input associated with the plurality of functions included in the one or more applications (1191). In some examples, the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display, such as GUI 916, that corresponds to a graphical component, such as widget 915G, associated with one of the one or more applications. In some examples, computing system 900 may generate at least one graphical component, such as GUI component 993, associated with one or more suggested natural language user inputs. In some examples, the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
[0239] Computing system 900 applies, using the information associated with the plurality of functions included in the one or more applications, machine learning module 910 to the indication of the natural language user input to generate instructions file 350, in which instructions file 350 includes instructions for generating at least one graphical component, such as widget 983 (1192). In some examples, language model module 342, which includes a large language model, is applied to the indication of the natural language user input. In some examples, computing system 900 sends instructions file 350 to computing device 912. In some examples, the at least one graphical component, such as widget 983, is associated with at least one function for performing a task.
[0240] In some examples, computing system 900 may receive, from computing device 912, a request to send instructions file 350 to companion device 981 that is associated with computing device 912, and then send, to companion device 981, instructions file 350. In some examples, computing system 900 is configured to update instructions file 350 responsive to receiving an updated natural language user input. In some examples, computing system 900 may receive the updated natural language user input, and apply machine learning module 910 to the updated natural language user input to update instructions file 350. Instructions file 350 may then include instructions for generating at least one updated graphical component. Computing system 900 may send, to computing device 912, updated instructions file 350.
[0241] In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer- readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to a tangible
medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media, which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that may be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer-readable medium.
[0242] By way of example, and not limitation, such computer-readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other storage medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection is properly termed a computer- readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transient, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
[0243] Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structures or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules. Also, the techniques could be fully implemented in one or more circuits or logic elements.
[0244] The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, various units may be combined in a hardware unit or provided by a collection of intraoperative hardware units, including one or more processors, in conjunction with suitable software and/or firmware. [0245] It is to be recognized that, depending on the example, certain acts or events of any of the techniques described herein may be performed in a different sequence, may be added, merged, or left out altogether (e.g., not all described acts or events are necessary for the practice of the techniques). Moreover, in certain examples, acts or events may be performed concurrently, e.g., through multi -threaded processing, interrupt processing, or multiple processors, rather than sequentially.
[0246] In some examples, a computer-readable storage medium comprises a non-transitory medium. The term “non-transitory” indicates that the storage medium is not embodied in a carrier wave or a propagated signal. In certain examples, a non-transitory storage medium may store data that can, over time, change (e.g., in RAM or cache).
[0247] Example 1 : A method includes retrieving, by a computing system, information associated with a plurality of functions included in one or more applications; receiving, by the computing system, an indication of a natural language user input associated with the plurality of functions included in the one or more applications; and applying, by the computing system, and using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
[0248] Example 2: The method of example 1, wherein the one or more applications are one or more applications executing at a computing device.
[0249] Example 3 : The method of example 2, wherein the method further includes sending, by the computing system and to the computing device, the set of instructions.
[0250] Example 4: The method of example 3, wherein the method further includes receiving, by the computing system and from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and sending, by the computing system and to the companion device, the set of instructions.
[0251] Example 5: The method of any of examples 1-4, wherein the machine learning model is a large language model.
[0252] Example 6: The method of any of examples 1-5, wherein the at least one graphical component is associated with at least one function for performing a task.
[0253] Example 7: The method of any of examples 1-6, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presencesensitive display that corresponds to a graphical component associated with an application from the one or more applications.
[0254] Example 8: The method of example 7, wherein the method further includes generating, by the computing system, at least one graphical component associated with one or more suggested natural language user inputs.
[0255] Example 9: The method of example 8, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
[0256] Example 10: The method of any of examples 1-9, wherein the method further includes updating, by the computing system, the set of instructions responsive to receiving an updated natural language user input.
[0257] Example 11 : The method of example 10, wherein further includes receiving, by the computing system, the updated natural language user input; and applying, by the computing system, the machine learning model to the updated natural language user input to update the set of instructions, wherein the set of instructions includes instructions for generating at least one updated graphical component.
[0258] Example 12: A computing system comprising: one or more processors; and one or more storage devices that store instructions, wherein the instructions, when executed by the one or more processors, cause the one or more processors to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; and apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
[0259] Example 13: The computing system of example 12, wherein the one or more applications are one or more applications executing at a computing device.
[0260] Example 14: The computing system of example 13, wherein the instructions further cause the one or more processors to: send, to the computing device, the set of instructions.
[0261] Example 15: The computing system of example 14, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and send, to the companion device, the set of instructions.
[0262] Example 16: The computing system of any of examples 12-15, wherein the machine learning model is a large language model.
[0263] Example 17: The computing system of any of examples 12-16, wherein the at least one graphical component is associated with at least one function for performing a task.
[0264] Example 18: The computing system of any of examples 12-17, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical component associated with an application from the one or more applications.
[0265] Example 19: The computing system of example 18, wherein the instructions further cause the one or more processors to: generate at least one graphical component associated with one or more suggested natural language user inputs.
[0266] Example 20: The computing system of example 19, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
[0267] Example 21 : The computing system of any of examples 12-20, wherein the instructions further cause the one or more processors to: update the set of instructions responsive to receiving an updated natural language user input.
[0268] Example 22: The computing system of example 21, wherein the instructions further cause the one or more processors to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the set of instructions, wherein the set of instructions includes instructions for generating at least one updated graphical component.
[0269] Example 23 : A non-transitory computer-readable storage medium encoded with instructions that, when executed by one or more processors of a computing device, cause the one or more processors to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; and apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
[0270] Example 24: The non-transitory computer-readable medium of example 23, wherein the one or more applications are one or more applications executing at a computing device. [0271] Example 25: The non-transitory computer-readable medium of example 24, wherein the instructions further cause the one or more processors to: send, to the computing device, the set of instructions.
[0272] Example 26: The non-transitory computer-readable medium of example 25, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and send, to the companion device, the set of instructions.
[0273] Example 27: The non-transitory computer-readable medium of any of examples 23-
26, wherein the machine learning model is a large language model.
[0274] Example 28: The non-transitory computer-readable medium of any of examples 23-
27, wherein the at least one graphical component is associated with at least one function for performing a task.
[0275] Example 29: The non-transitory computer-readable medium of any of examples 23-
28, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical component associated with an application from the one or more applications.
[0276] Example 30: The non-transitory computer-readable medium of example 29, wherein the instructions further cause the one or more processors to: generate at least one graphical component associated with one or more suggested natural language user inputs.
[0277] Example 31 : The non-transitory computer-readable medium of example 30, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
[0278] Example 32: The non-transitory computer-readable medium of any of examples 23- 31, wherein the instructions further cause the one or more processors to: update the set of instructions responsive to receiving an updated natural language user input.
[0279] Example 33: The non-transitory computer-readable medium of example 32, wherein the instructions further cause the one or more processors to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the set of instructions, wherein the set of instructions includes instructions for generating at least one updated graphical component.
[0280] Example 34: A computer program product for generating custom graphical components for one or more applications, the computer program product comprising one or
more instructions that, when executed by at least one processor, cause the at least one processor to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; and apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
[0281] Example 35: The computer program product of example 34, wherein the one or more applications are one or more applications executing at a computing device.
[0282] Example 36: The computer program product of example 35, wherein the one or more instructions further cause the at least one processor to: send, to the computing device, the set of instructions.
[0283] Example 37: The computer program product of example 36, wherein the one or more instructions further cause the at least one processor to: receive, from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and send, to the companion device, the set of instructions.
[0284] Example 38: The computer program product of any of examples 34-37, wherein the machine learning model is a large language model.
[0285] Example 39: The computer program product of any of examples 34-38, wherein the at least one graphical component is associated with at least one function for performing a task. [0286] Example 40: The computer program product of any of examples 34-39, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical component associated with an application from the one or more applications.
[0287] Example 41 : The computer program product of example 40, wherein the one or more instructions further cause the at least one processor to: generate at least one graphical component associated with one or more suggested natural language user inputs.
[0288] Example 42: The computer program product of example 41, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
[0289] Example 43: The computer program product of any of examples 34-42, wherein the one or more instructions further cause the at least one processor to: update the set of instructions responsive to receiving an updated natural language user input.
[0290] Example 44: The computer program product of example 43, wherein the one or more
instructions further cause the at least one processor to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the set of instructions, wherein the set of instructions includes instructions for generating at least one updated graphical component.
[0291] Example 45: A method includes retrieving, by a computing system, information associated with a plurality of functions included in one or more applications; receiving, by the computing system, an indication of a natural language user input associated with the plurality of functions included in the one or more applications; applying, by the computing system, a machine learning model to the indication of the natural language user input to identify one or more tasks, wherein each task from the one or more tasks is associated with a respective category from one or more categories; and applying, by the computing system, and using the instructions , the machine learning model to the one or more tasks to generate a set of instructions, wherein the set of instructions is associated with at least one function for performing a respective task from the one or more tasks, wherein the set of instructions includes instructions for generating at least one graphical user interface associated with the respective category, and wherein the at least one graphical user interface associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task.
[0292] Example 46: The method of example 45, wherein the method further includes applying, by the computing system, the machine learning model to the indication of the natural language user input to identify the one or more categories.
[0293] Example 47: The method of any of examples 45 and 46, wherein the one or more applications are one or more applications executing at a computing device, wherein the method further includes sending, by the computing system and to the computing device, the set of instructions.
[0294] Example 48: The method of example 47, wherein the method further includes receiving, by the computing system and from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and sending, by the computing system and to the companion device, the set of instructions.
[0295] Example 49: The method of any of examples 45 through 48, wherein the at least one graphical user interface associated with the respective category includes one or more of: at least one graphical component including text data associated with the respective category; at least one graphical component including text data associated with information from the one or more applications; at least one graphical component associated with one or more suggested
inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task.
[0296] Example 50: The method of example 49, wherein the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface.
[0297] Example 51 : The method of any of examples 49 and 50, wherein the at least one graphical user interface associated with the respective category includes the at least one graphical component associated with the one or more suggested inputs, wherein the method further includes responsive to receiving input indicative of a selection of a suggested input from the one or more suggested inputs, updating, by the computing system, the at least one graphical component associated with the at least one function for performing the respective task.
[0298] Example 52: The method of any of examples 49 through 51, wherein the at least one graphical user interface associated with the respective category includes the at least one suggested graphical component, and wherein the at least one suggested graphical component is based on one or more of at least one keyword and at least one user-configurable control. [0299] Example 53: The method of example 52, wherein the method further includes receiving, by the computing system, an indication of a user input associated with one or more of the at least one keyword and at least one user-configurable control; and updating, by the computing system and based on the indication of the user input, the at least one suggested graphical component.
[0300] Example 54: The method of any of examples 45 through 53, wherein the at least one graphical component includes a first graphical component and a second graphical component, wherein the first graphical component is associated with a first function for performing the respective task, and wherein the second graphical component is associated with a second function for performing the respective task.
[0301] Example 55: The method of any of examples 45 through 54, wherein the method further includes receiving, by the computing system, one or more of an additional indication of a user input and context information from the one or more applications; and updating, based on one or more of the additional indication of a user input and the context information, the at least one graphical user interface.
[0302] Example 56: The method of any of examples 45 through 55, wherein the machine learning model is a large language model.
[0303] Example 57: A computing system includes one or more processors; and one or more storage devices that store instructions, wherein the instructions, when executed by the one or more processors, cause the one or more processors to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; apply a machine learning model to the indication of the natural language user input to identify one or more tasks, wherein each task from the one or more tasks is associated with a respective category from one or more categories; and apply, using the information associated with the plurality of functions, the machine learning model to the one or more tasks to generate a set of instructions, wherein the set of instructions is associated with at least one function for performing a respective task from the one or more tasks, wherein the set of instructions includes instructions for generating at least one graphical user interface associated with the respective category, and wherein the at least one graphical user interface associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task.
[0304] Example 58: The computing system of example 57, wherein the instructions further cause the one or more processors to: apply the machine learning model to the indication of the natural language user input to identify the one or more categories.
[0305] Example 59: The computing system of any of examples 57 and 58, wherein the one or more applications are one or more applications executing at a computing device, and wherein the instructions further cause the one or more processors to: send, to the computing device, the set of instructions.
[0306] Example 60: The computing system of example 59, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and send, to the companion device, the set of instructions.
[0307] Example 61 : The computing system of any of examples 57 through 60, wherein the at least one graphical user interface associated with the respective category includes one or more of: at least one graphical component including text data associated with the respective category; at least one graphical component including text data associated with information from the one or more applications; at least one graphical component associated with one or
more suggested inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task.
[0308] Example 62: The computing system of example 61, wherein the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface.
[0309] Example 63: The computing system of any of examples 61 and 62, wherein the at least one graphical user interface associated with the respective category includes the at least one graphical component associated with the one or more suggested inputs, wherein the instructions further cause the one or more processors to: responsive to receiving input indicative of a selection of a suggested input from the one or more suggested inputs, update the at least one graphical component associated with the at least one function for performing the respective task.
[0310] Example 64: The computing system of any of examples 61 through 63, wherein the at least one graphical user interface associated with the respective category includes the at least one suggested graphical component, and wherein the at least one suggested graphical component is based on one or more of at least one keyword and at least one user-configurable control.
[0311] Example 65: The computing system of example 64, wherein the instructions further cause the one or more processors to: receive an indication of a user input associated with one or more of the at least one keyword and at least one user-configurable control; and update, based on the indication of the user input, the at least one suggested graphical component.
[0312] Example 66: The computing system of any of examples 57 through 65, wherein the at least one graphical component includes a first graphical component and a second graphical component, wherein the first graphical component is associated with a first function for performing the respective task, and wherein the second graphical component is associated with a second function for performing the respective task.
[0313] Example 67: The computing system of any of examples 57 through 66, wherein the instructions further cause the one or more processors to: receive one or more of an additional indication of a user input and context information from the one or more applications; and update, based on one or more of the additional indication of a user input and the context information, the at least one graphical user interface.
[0314] Example 68: The computing system of any of examples 57 through 67, wherein the machine learning model is a large language model.
[0315] Example 69: The computing system of any of examples 57 through 68, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display.
[0316] Example 70: A non-transitory computer-readable storage medium encoded with instructions that, when executed by one or more processors, cause one or more processors to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; apply a machine learning model to the indication of the natural language user input to identify one or more tasks, wherein each task from the one or more tasks is associated with a respective category from one or more categories; and apply, using the information associated with the plurality of functions, the machine learning model to the one or more tasks to generate a set of instructions, wherein the set of instructions is associated with at least one function for performing a respective task from the one or more tasks, wherein the set of instructions includes instructions for generating at least one graphical user interface associated with the respective category, and wherein the at least one graphical user interface associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task.
[0317] Example 71 : The non-transitory computer-readable storage medium of example 70, wherein the instructions further cause the one or more processors to: apply the machine learning model to the indication of the natural language user input to identify the one or more categories.
[0318] Example 72: The non-transitory computer-readable storage medium of any of examples 70 and 71, wherein the one or more applications are one or more applications executing at a computing device, and wherein the instructions further cause the one or more processors to: send, to the computing device, the set of instructions.
[0319] Example 73: The non-transitory computer-readable storage medium of example 72, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and send, to the companion device, the set of instructions. [0320] Example 74: The non-transitory computer-readable storage medium of any of examples 70 through 73, wherein the at least one graphical user interface associated with the
respective category includes one or more of: at least one graphical component including text data associated with the respective category; at least one graphical component including text data associated with information from the one or more applications; at least one graphical component associated with one or more suggested inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task.
[0321] Example 75: The non-transitory computer-readable storage medium of example 74, wherein the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface.
[0322] Example 76: The non-transitory computer-readable storage medium of any of examples 74 and 75, wherein the at least one graphical user interface associated with the respective category includes the at least one graphical component associated with the one or more suggested inputs, wherein the instructions further cause the one or more processors to: responsive to receiving input indicative of a selection of a suggested input from the one or more suggested inputs, update the at least one graphical component associated with the at least one function for performing the respective task.
[0323] Example 77: The non-transitory computer-readable storage medium of any of examples 74 through 76, wherein the at least one graphical user interface associated with the respective category includes the at least one suggested graphical component, and wherein the at least one suggested graphical component is based on one or more of at least one keyword and at least one user-configurable control.
[0324] Example 78: The non-transitory computer-readable storage medium of example 77, wherein the instructions further cause the one or more processors to: receive an indication of a user input associated with one or more of the at least one keyword and at least one user- configurable control; and update, based on the indication of the user input, the at least one suggested graphical component.
[0325] Example 79: The non-transitory computer-readable storage medium of any of examples 70 through 78, wherein the at least one graphical component includes a first graphical component and a second graphical component, wherein the first graphical component is associated with a first function for performing the respective task, and wherein
the second graphical component is associated with a second function for performing the respective task.
[0326] Example 80: The non-transitory computer-readable storage medium of any of examples 70 through 79, wherein the instructions further cause the one or more processors to: receive one or more of an additional indication of a user input and context information from the one or more applications; and update, based on one or more of the additional indication of a user input and the context information, the at least one graphical user interface.
[0327] Example 81 : The non-transitory computer-readable storage medium of any of examples 70 through 80, wherein the machine learning model is a large language model. [0328] Example 82: The non-transitory computer-readable storage medium of any of examples 70 through 81, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display.
[0329] Example 83 : A computer program product for generating custom user interfaces and functionality for performing tasks, the computer program product comprising one or more instructions that, when executed by at least one processor, cause the at least one processor to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; apply a machine learning model to the indication of the natural language user input to identify one or more tasks, wherein each task from the one or more tasks is associated with a respective category from one or more categories; and apply, using the information associated with the plurality of functions, the machine learning model to the one or more tasks to generate a set of one or more instructions, wherein the set of instructions is associated with at least one function for performing a respective task from the one or more tasks, wherein the set of one or more instructions includes one or more instructions for generating at least one graphical user interface associated with the respective category, and wherein the at least one graphical user interface associated with the respective category includes at least one graphical component associated with the at least one function for performing the respective task.
[0330] Example 84: The computer program product of example 83, wherein the one or more instructions further cause the at least one processor to: apply the machine learning model to the indication of the natural language user input to identify the one or more categories.
[0331] Example 85: The computer program product of any of examples 83 and 84, wherein the one or more applications are one or more applications executing at a computing device,
and wherein the one or more instructions further cause the at least one processor to: [0332] send, to the computing device, the set of one or more instructions.
[0333] Example 86: The computer program product of example 85, wherein the one or more instructions further cause the at least one processor to: receive, from the computing device, a request to send the set of one or more instructions to a companion device associated with the computing device; and send, to the companion device, the set of one or more instructions.
[0334] Example 87: The computer program product of any of examples 83 through 86, wherein the at least one graphical user interface associated with the respective category includes one or more of: at least one graphical component including text data associated with the respective category; at least one graphical component including text data associated with information from the one or more applications; at least one graphical component associated with one or more suggested inputs, and at least one suggested graphical component associated with the at least one function for performing the respective task.
[0335] Example 88: The computer program product of example 87, wherein the text data associated with the category, the text data associated with the information, the one or more suggested inputs, and the at least one suggested graphical component are based on one or more of historical natural language user inputs, context information from the one or more applications, user data, and information associated with the at least one graphical user interface.
[0336] Example 89: The computer program product of any of examples 87 and 88, wherein the at least one graphical user interface associated with the respective category includes the at least one graphical component associated with the one or more suggested inputs, wherein the one or more instructions further cause the at least one processor to: responsive to receiving input indicative of a selection of a suggested input from the one or more suggested inputs, update the at least one graphical component associated with the at least one function for performing the respective task.
[0337] Example 90. The computer program product of any of examples 87 through 89, wherein the at least one graphical user interface associated with the respective category includes the at least one suggested graphical component, and wherein the at least one suggested graphical component is based on one or more of at least one keyword and at least one user-configurable control.
[0338] Example 91. The computer program product of example 90, wherein the one or more instructions further cause the at least one processor to: receive an indication of a user
input associated with one or more of the at least one keyword and at least one user- configurable control; and update, based on the indication of the user input, the at least one suggested graphical component.
[0339] Example 92: The computer program product of any of examples 83 through 91, wherein the at least one graphical component includes a first graphical component and a second graphical component, wherein the first graphical component is associated with a first function for performing the respective task, and wherein the second graphical component is associated with a second function for performing the respective task.
[0340] Example 93 : The computer program product of any of examples 83 through 92, wherein the one or more instructions further cause the at least one processor to: receive one or more of an additional indication of a user input and context information from the one or more applications; and update, based on one or more of the additional indication of a user input and the context information, the at least one graphical user interface.
[0341] Example 94: The computer program product of any of examples 83 through 94, wherein the machine learning model is a large language model.
[0342] Example 95: The computer program product of any of examples 83 through 94, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display.
[0343] Example 96: A computing system comprising means for performing any combination of the methods of examples 45-56.
[0344] Example 97: A computer-readable storage medium encoded with instructions for performing any combination of the methods of examples 45-56.
[0345] Example 98: A method includes retrieving, by a computing system, a first set of instructions associated with a first plurality of functions included in an application; receiving, by the computing system, an indication of a natural language user input associated with one or more functions from the first plurality of functions included in the application; and applying, by the computing system, and using the first set of instructions, a machine learning model to the indication of the natural language user input to generate a second set of instructions associated with one or more functions from a second plurality of functions included in the application, wherein the second set of instructions includes instructions for generating a first graphical user interface.
[0346] Example 99: The method of example 98, wherein the application is an application executing at a computing device.
[0347] Example 100: The method of example 99, further comprising sending, by the computing system and to the computing device, the second set of instructions.
[0348] Example 101 : The method of example 100, further comprising: receiving, by the computing system and from the computing device, a request to send the second set of instructions to a companion device associated with the computing device; and sending, by the computing system and to the companion device, the second set of instructions.
[0349] Example 102: The method of any of examples 98 through 101, wherein the machine learning model is a large language model.
[0350] Example 103: The method of any of examples 98 through 102, wherein the first graphical user interface includes at least one graphical component associated with the one or more functions from the second plurality of functions included in the application.
[0351] Example 104: The method of any of examples 98 through 103, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical user interface component associated with the application.
[0352] Example 105: The method of example 104, further comprising: generating, by the computing system, a second graphical user interface, wherein the second graphical user interface includes at least one graphical component associated with one or more suggested natural language user inputs.
[0353] Example 106: The method of example 105, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
[0354] Example 107: The method of any of examples 98-106, wherein the computing system is configured to update the second set of instructions responsive to receiving an updated natural language user input.
[0355] Example 108: The method of example 107, further comprising: receiving, by the computing system, the updated natural language user input; and applying, by the computing system, the machine learning model to the updated natural language user input to update the second set of instructions, wherein the second set of instructions includes instructions for generating an updated graphical user interface.
[0356] Example 109: A computing system includes: one or more processors; and one or more storage devices that store instructions, wherein the instructions, when executed by the one or more processors, cause the one or more processors to: retrieve a first set of instructions associated with a first plurality of functions included in an application; receive an indication of a natural language user input associated with one or more functions from the first plurality
of functions included in the application; and apply, using the first set of instructions, a machine learning model to the indication of the natural language user input to generate a second set of instructions associated with one or more functions from a second plurality of functions included in the application, wherein the second set of instructions includes instructions for generating a first graphical user interface.
[0357] Example 110: The computing system of example 109, wherein the application is an application executing at a computing device.
[0358] Example 111 : The computing system of example 110, wherein the instructions further cause the one or more processors to send, to the computing device, the second set of instructions.
[0359] Example 112: The computing system of example 111, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the second set of instructions to a companion device associated with the computing device; and send, to the companion device, the second set of instructions.
[0360] Example 113: The computing system of any of examples 109 through 112, wherein the machine learning model is a large language model.
[0361] Example 114: The computing system of any of examples 109 through 113, wherein the first graphical user interface includes at least one graphical component associated with the one or more functions from the second plurality of functions included in the application.
[0362] Example 115: The computing system of any of examples 109 through 114, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical user interface component associated with the application.
[0363] Example 116: The computing system of example 115, wherein the instructions further cause the one or more processors to: generate a second graphical user interface, wherein the second graphical user interface includes at least one graphical component associated with one or more suggested natural language user inputs.
[0364] Example 117: The computing system of example 116, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
[0365] Example 119: The computing system of any of examples 109 through 117, wherein the instructions further cause the one or more processors to update the second set of instructions responsive to receiving an updated natural language user input.
[0366] Example 120: The computing system of example 119, wherein the instructions further cause the one or more processors to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the second set of instructions, wherein the second set of instructions includes instructions for generating an updated graphical user interface.
[0367] Example 121 : A non-transitory computer-readable storage medium encoded with instructions that, when executed by one or more processors, cause one or more processors to: retrieve a first set of instructions associated with a first plurality of functions included in an application; receive an indication of a natural language user input associated with one or more functions from the first plurality of functions included in the application; and apply, using the first set of instructions, a machine learning model to the indication of the natural language user input to generate a second set of instructions associated with one or more functions from a second plurality of functions included in the application, wherein the second set of instructions includes instructions for generating a first graphical user interface.
[0368] Example 122: The non-transitory computer-readable medium of example 121, wherein the application is an application executing at a computing device.
[0369] Example 123: The non-transitory computer-readable medium of example 122, wherein the instructions further cause the one or more processors to send, to the computing device, the second set of instructions.
[0370] Example 124: The non-transitory computer-readable medium of example 123, wherein the instructions further cause the one or more processors to: receive, from the computing device, a request to send the second set of instructions to a companion device associated with the computing device; and send, to the companion device, the second set of instructions.
[0371] Example 125: The non-transitory computer-readable medium of any of examples 121 through 124, wherein the machine learning model is a large language model.
[0372] Example 126: The non-transitory computer-readable medium of any of examples 121 through 125, wherein the first graphical user interface includes at least one graphical component associated with the one or more functions from the second plurality of functions included in the application.
[0373] Example 127: The non-transitory computer-readable medium of any of examples 121 through 126, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical user interface component associated with the application.
[0374] Example 128: The non-transitory computer-readable medium of example 127, wherein the instructions further cause the one or more processors to: generate a second graphical user interface, wherein the second graphical user interface includes at least one graphical component associated with one or more suggested natural language user inputs. [0375] Example 129: The non-transitory computer-readable medium of example 128, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
[0376] Example 130: The non-transitory computer-readable medium of any of examples 121 through 129, wherein the instructions further cause the one or more processors to update the second set of instructions responsive to receiving an updated natural language user input. [0377] Example 131 : The non-transitory computer-readable medium of example 130, wherein the instructions further cause the one or more processors to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the second set of instructions, wherein the second set of instructions includes instructions for generating an updated graphical user interface.
[0378] Example 132: A computer program product for generating custom graphical components for one or more applications, the computer program product comprising one or more instructions that, when executed by at least one processor, cause the at least one processor to: retrieve a first set of instructions associated with a first plurality of functions included in an application; receive an indication of a natural language user input associated with one or more functions from the first plurality of functions included in the application; and apply, using the first set of instructions, a machine learning model to the indication of the natural language user input to generate a second set of instructions associated with one or more functions from a second plurality of functions included in the application, wherein the second set of instructions includes instructions for generating a first graphical user interface. [0379] Example 133: The computer program product of example 132, wherein the application is an application executing at a computing device.
[0380] Example 134: The computer program product of example 133, wherein the one or more instructions further cause the at least one processor to send, to the computing device, the second set of instructions.
[0381] Example 135: The computer program product of example 134, wherein the one or more instructions further cause the at least one processor to: receive, from the computing device, a request to send the second set of instructions to a companion device associated with the computing device; and send, to the companion device, the second set of instructions.
[0382] Example 136: The computer program product of any of examples 132 through 37, wherein the machine learning model is a large language model.
[0383] Example 137: The computer program product of any of examples 132 through 136, wherein the first graphical user interface includes at least one graphical component associated with the one or more functions from the second plurality of functions included in the application.
[0384] Example 138: The computer program product of any of examples 132 through 137, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical user interface component associated with the application.
[0385] Example 139: The computer program product of example 138, wherein the one or more instructions further cause the at least one processor to: generate a second graphical user interface, wherein the second graphical user interface includes at least one graphical component associated with one or more suggested natural language user inputs.
[0386] Example 140: The computer program product of example 139, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
[0387] Example 141 : The computer program product of any of examples 132 through 140, wherein the one or more instructions further cause the at least one processor to update the second set of instructions responsive to receiving an updated natural language user input. [0388] Example 142: The computer program product of example 141, wherein the one or more instructions further cause the at least one processor to: receive the updated natural language user input; and apply the machine learning model to the updated natural language user input to update the second set of instructions, wherein the second set of instructions includes instructions for generating an updated graphical user interface.
[0389] Various examples have been described. These and other examples are within the scope of the following claims.
Claims
1. A method comprising: retrieving, by a computing system, information associated with a plurality of functions included in one or more applications; receiving, by the computing system, an indication of a natural language user input associated with the plurality of functions included in the one or more applications; and applying, by the computing system, and using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
2. The method of claim 1, wherein the one or more applications are one or more applications executing at a computing device.
3. The method of claim 2, further comprising: sending, by the computing system and to the computing device, the set of instructions.
4. The method of claim 3, further comprising: receiving, by the computing system and from the computing device, a request to send the set of instructions to a companion device associated with the computing device; and sending, by the computing system and to the companion device, the set of instructions.
5. The method of any of claims 1-4, wherein the machine learning model is a large language model.
6. The method of any of claims 1-5, wherein the at least one graphical component is associated with at least one function for performing a task.
7. The method of any of claims 1-6, wherein the indication of the natural language user input is received in response to a gesture detected at a location of a presence-sensitive display that corresponds to a graphical component associated with an application from the one or more applications.
8. The method of claim 7, further comprising: generating, by the computing system, at least one graphical component associated with one or more suggested natural language user inputs.
9. The method of claim 8, wherein the one or more suggested natural language user inputs are based on one or more historical natural language user inputs.
10. The method of any of claims 1-9, further comprising: updating, by the computing system, the set of instructions responsive to receiving an updated natural language user input.
11. The method of claim 10, further comprising: receiving, by the computing system, the updated natural language user input; and applying, by the computing system, the machine learning model to the updated natural language user input to update the set of instructions, wherein the set of instructions includes instructions for generating at least one updated graphical component.
12. A computing system comprising: one or more processors; and one or more storage devices that store instructions, wherein the instructions, when executed by the one or more processors, cause the one or more processors to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; and apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
13. The computing system of claim 12, wherein the at least one graphical component is associated with at least one function for performing a task.
14. A non-transitory computer-readable storage medium encoded with instructions that,
when executed by one or more processors of a computing device, cause the one or more processors to perform any of the methods of claim 1-11.
15. A computer program product for generating custom graphical components for one or more applications, the computer program product comprising one or more instructions that, when executed by at least one processor, cause the at least one processor to: retrieve information associated with a plurality of functions included in one or more applications; receive an indication of a natural language user input associated with the plurality of functions included in the one or more applications; and apply, using the information associated with the plurality of functions, a machine learning model to the indication of the natural language user input to generate a set of instructions, wherein the set of instructions includes instructions for generating at least one graphical component.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363586242P | 2023-09-28 | 2023-09-28 | |
| US63/586,242 | 2023-09-28 | ||
| US202463697201P | 2024-09-20 | 2024-09-20 | |
| US63/697,201 | 2024-09-20 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025072505A1 true WO2025072505A1 (en) | 2025-04-03 |
Family
ID=93100642
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/048636 Pending WO2025072505A1 (en) | 2023-09-28 | 2024-09-26 | Using large language models to generate user interface components |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025072505A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120523923A (en) * | 2025-07-25 | 2025-08-22 | 合肥工业大学 | News information processing method, device, electronic device and storage medium |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021201982A1 (en) * | 2020-03-31 | 2021-10-07 | Microsoft Technology Licensing, Llc | Automating tasks for a user across their mobile applications |
-
2024
- 2024-09-26 WO PCT/US2024/048636 patent/WO2025072505A1/en active Pending
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021201982A1 (en) * | 2020-03-31 | 2021-10-07 | Microsoft Technology Licensing, Llc | Automating tasks for a user across their mobile applications |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120523923A (en) * | 2025-07-25 | 2025-08-22 | 合肥工业大学 | News information processing method, device, electronic device and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11875123B1 (en) | Advice generation system | |
| US12430330B2 (en) | Calibrating confidence scores of a machine learning model trained as a natural language interface | |
| US11017180B2 (en) | System and methods for processing and interpreting text messages | |
| US12242971B2 (en) | Adversarial training of machine learning models | |
| US20230409615A1 (en) | Systems and Methods for Providing User Experiences on Smart Assistant Systems | |
| JP7726995B2 (en) | Enhanced Logit for Natural Language Processing | |
| US20240362409A1 (en) | Data Insight Generation and Presentation | |
| JP2024539003A (en) | Fine-tuning multi-head networks from a single transformer layer on pre-trained language models | |
| US20250094821A1 (en) | Multi-task fine-tuning for planning performed by large language model | |
| WO2022235353A1 (en) | Variant inconsistency attack (via) as a simple and effective adversarial attack method | |
| US20240062044A1 (en) | Addressing catastrophic forgetting and over-generalization while training a natural language to a meaning representation language system | |
| US12164503B1 (en) | Database management systems and methods for datasets | |
| WO2025072505A1 (en) | Using large language models to generate user interface components | |
| WO2025071985A1 (en) | Using large language models to generate view-based accessibility information | |
| US20250190844A1 (en) | Database and data structure management systems and methods facilitating deviation detection of statistical properties of data | |
| WO2024249180A1 (en) | Heterogeneous feature interactions with transformers | |
| WO2024097683A1 (en) | Game performance prediction across a device ecosystem | |
| US12450240B2 (en) | Database management systems | |
| US12430319B2 (en) | Proactive database management systems | |
| US12321358B1 (en) | Database management systems | |
| US20250390316A1 (en) | Dynamic and customized gui generation and processing systems | |
| US20250181602A1 (en) | Database and data structure management systems facilitating dataset consolidation | |
| US12488244B1 (en) | Apparatus and method for data generation for user engagement | |
| US20250190406A1 (en) | Ai-based converson of a natural language prompt to a system-specific segment definition using entity reduction and renaming | |
| US20250086870A1 (en) | Recommendation systems for generating virtual environments based on personalized recommendations |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24790064 Country of ref document: EP Kind code of ref document: A1 |