US12386886B2 - Contextual text lookup for images - Google Patents
Contextual text lookup for imagesInfo
- Publication number
- US12386886B2 US12386886B2 US17/973,500 US202217973500A US12386886B2 US 12386886 B2 US12386886 B2 US 12386886B2 US 202217973500 A US202217973500 A US 202217973500A US 12386886 B2 US12386886 B2 US 12386886B2
- Authority
- US
- United States
- Prior art keywords
- image
- text
- application
- search results
- obtaining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/532—Query formulation, e.g. graphical querying
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5846—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/907—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/908—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/768—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using context analysis, e.g. recognition aided by known co-occurring patterns
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
Definitions
- the present description generally relates to machine learning, including, for example, using machine learning for contextual text lookup for images.
- Conventional search engines are configured to perform searches for strings of text, typically entered by a user into a browser application at an end user's device. For example, a user of an electronic device that sees a product name may open a browser application, type the product name into the browser application, and submit the typed text to a search engine for lookup via the browser application.
- FIG. 1 illustrates an example network environment in accordance with one or more implementations of the subject technology.
- FIG. 2 illustrates an example system in accordance with one or more implementations of the subject technology.
- FIG. 3 illustrates an example of an electronic device displaying an image in accordance with one or more implementations of the subject technology.
- FIG. 4 illustrates an example of an electronic device displaying context-based search results for image text in accordance with one or more implementations of the subject technology.
- FIG. 6 illustrates a flowchart of another example process that may be performed by an electronic device for contextual text lookup for an image in accordance with one or more implementations of the subject technology.
- FIG. 7 illustrates a flowchart of an example process that may be performed by a server for contextual text lookup for an image in accordance with one or more implementations of the subject technology.
- FIG. 8 illustrates an example electronic system with which aspects of the subject technology may be implemented in accordance with one or more implementations.
- Electronic devices can recognize text in an image being displayed by the electronic device, and provide a user with options to select or otherwise interact with the recognized text in the image.
- the user may request a search for (e.g., a lookup of) some or all of the text a displayed image.
- the subject technology provides improved text lookup for text identified in an image displayed by an electronic device, by using other contextual information in the image, and/or associated with the image, to enhance the search results.
- contextual information may be derived from the image and may include other text in the image or on the screen, an object type of an object in the image or on the screen, one or more embeddings of portions of the image, application information for an application displaying the image, location information for the image and/or for the device displaying the image, or any other information that can be extracted from or derived from the image.
- the contextual information can be used to locally rank server-provided search results at the device displaying the image, or the contextual information can be sent to a server with the selected text to enhance the search results from the server.
- the server-provided search results can also be displayed with locally generated dictionary results (e.g., a dictionary entry for word or words in the selected text) in one or more implementations.
- the network environment 100 includes a computing device 110 (also referred herein to as an electronic device), and a server 120 .
- the network 106 may communicatively (directly or indirectly) couple the computing device 110 and/or the server 120 .
- the network 106 may be an interconnected network of devices that may include, or may be communicatively coupled to, the Internet.
- the network environment 100 is illustrated in FIG. 1 as including the computing device 110 , and the server 120 ; however, the network environment 100 may include any number of electronic devices and any number of servers.
- the server 120 may train one or more machine learning models for deployment to a client electronic device (e.g., the computing device 110 ).
- the server 120 may provide a system for training a machine learning model using training data, where the trained machine learning model is subsequently deployed locally at the server 120 .
- the machine learning model may be deployed on the server 120 and/or the computing device 110 may then perform one or more machine learning algorithms.
- the server 120 may provide a cloud service that utilizes the trained machine learning model and is continually refined over time.
- the server 120 may be, and/or may include all or part of, the systems discussed below with respect to FIG. 2 and/or FIG. 8 .
- the system 200 may include a processor 202 , memory 204 (memory device) and a communication unit 210 .
- the memory 204 may store data 206 and one or more machine learning models 208 .
- the system 200 may include or may be communicatively coupled with a storage 212 .
- the storage 212 may be either an internal storage or an external storage.
- the system 200 includes one or more camera(s) 211 , a display 214 , and one or more sensors(s) 216 .
- Camera(s) 211 may be operable to capture images, and may be mounted on front surface, a rear surface, or any other suitable location on the computing device 110 of FIG. 1 .
- the processor 202 may be a single processing unit or multiple processing units.
- the processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions.
- the processor 202 is configured to fetch and execute computer-readable instructions and data stored in the memory 204 .
- the memory 204 may include any non-transitory computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read-only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
- volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM)
- DRAM dynamic random access memory
- non-volatile memory such as read-only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
- the data 206 may represent, amongst other things, a repository of data processed, received, and generated by one or more processors such as the processor 202 .
- One or more of the aforementioned components of the system 200 may send or receive data, for example, using one or more input/output ports and one or more communication units.
- the machine learning model(s) 208 may include one or more of machine learning based models and artificial intelligence-based models, such as, for example, neural networks, or any other models and/or machine learning architectures.
- the machine learning model(s) 208 may be trained using training data (e.g., included in the data 206 or other data) and may be implemented by the processor 202 for performing one or more of the operations, as described herein.
- the communication unit 210 may include one or more hardware units that support wired or wireless communication between the processor 202 and processors of other computing devices.
- an image may be displayed by the computing device 110 implementing the system 200 .
- the image may be stored in the storage 212 , the memory 204 , and/or may be received from a remote device or server.
- the image may be displayed in an image viewing application of the computing device 110 .
- the image may be displayed by a browser application, a social media application, a digital media player application, that can display images.
- the computing device 110 may display a live preview of a field of view, as captured by a camera of the computing device 110 .
- the processor 202 may be configured to obtain the image being displayed by an application running on the computing device 110 .
- the processor 202 may determine that an image is being displayed or is about to be displayed by an application running at the computing device, and may provide the image and/or a portion thereof to one or more of the machine-learning models 208 .
- the machine learning model(s) 208 may be trained to identify text, and/or one or more elements of interest in the image.
- one or more of the machine-learning models may be configured to recognize text in an image displayed on the display 214 .
- one or more of the machine learning model(s) 208 may receive the image as input and then output an object type of an object in the image and/or may output one or more embeddings of portions of the image.
- the processor 202 may also obtain other contextual information for the image, such as application information indicating the application that is displaying the image, and/or location information for the image and/or for the computing device 110 and/or system 200 .
- the processor 202 may identify the elements of interest in the image.
- a smart camera model may be implemented to detect any text, if present, in the image
- an object detector may be implemented for identifying and/or classifying objects present in the image
- a gating model also referred to herein as a coarse-classification model
- a scene classification model may be implemented to detect and classify the overall scene depicted in the image.
- the processor 202 may derive contextual information from an image, such as by determining various types of elements of interest in the image, and/or extracting features, information, and/or other signals from the image.
- FIG. 3 illustrates an example in which the computing device 110 displays an image 300 .
- the image 300 is displayed by an application running on the computing device 110
- an application identifier (ID) 308 for the displaying application is displayed with the image on the display 302 of the computing device 110 .
- the display 302 of FIG. 3 may be an implementation of the display 214 of system 200 of FIG. 2 , in one or more implementations.
- additional information is also displayed with the image 300 on the display 302 of the computing device 110 .
- the additional information includes application controls 310 for the application displaying the image 300 , a current time 304 , and a location indicator 306 (e.g., indicating the current location of the computing device 110 is known, such as based on sensor data from sensor(s) 216 ).
- the application controls 310 maybe virtual buttons or other interactive features of the application that is displaying the image 300 (e.g., control buttons or interactive features for controlling a browser application, an image display application, a social media application, a camera application, a media playback application, etc.).
- the image 300 includes image text 312 , image text 318 , and objects such as a foreground object 314 and a background object 316 .
- the computing device 110 may identify an object type (e.g., a classification) of the objects (e.g., foreground object 314 and/or background object 316 ) in the image 300 .
- the computing device 110 may also recognize the image text 312 and the image text 318 , and modify the display of the image 300 to make the image text 312 and/or the image text 318 selectable and/or searchable.
- a user can tap or touch the location of the displayed text in the image 300 , causing a selection tool or highlighter to surface for selection of the displayed text in the image 300 .
- the user can again tap or “right-click” on the selected text to surface options, such as a search or lookup option that causes the computing device 110 to obtain search results for the selected text.
- a user of the computing device 110 can interact with the image 300 using a finger, a cursor, or other input mechanism to select the image text 312 , and can initiate a search for the selected image text 312 .
- the user may see the image text 312 (or otherwise be provided with information indicating the presence of the image text 312 ) in the displayed image 300 and use a voice input to a virtual assistant application running on the computing device 110 to request a search for the image text 312 that is included in the displayed image 300 .
- the image 300 may be an image of a storefront and the image 300 may include image text 312 indicating the name of the store.
- a search for only the image text indicating the name of the store may return search results that are not relevant to the store.
- an image of a restaurant named “Butterfly” may be displayed on a user's smart phone, and the user may request a search for the text “Butterfly” displayed in the image 300 .
- the search results may be unrelated to the desired search results for the text from the image.
- the subject technology provides improved text lookup or search for text identified in an image, by using other contextual information in the image to enhance the search results.
- contextual information may include other text (e.g., unselected and/or unsearched text, such as the image text 318 ) in the image 300 or elsewhere on the display 302 (e.g., text associated with the application ID 308 , and/or text associated with the application controls 310 ), an object type of an object (e.g., the foreground object 314 and/or the background object 316 ) in the image 300 or on the display 302 , one or more embeddings of portions of the image 300 , application information (e.g., the application identifier 308 or an application type) for an application displaying the image 300 , location information for the image 300 and/or for the computing device 110 , etc.
- application information e.g., the application identifier 308 or an application type
- the computing device 110 may identify one or more objects in the image, such as plates of food, a menu, tables and chairs, doors or windows, or other objects indicative of a restaurant, may identify the relative depths of objects in the images, relative distances between objects and/or text in the image, may identify other text in the image (e.g., another word such as “restaurant”, “bistro”, “cafe”, or the like).
- the computing device 110 may then initiate an enhanced search for the image text “butterfly”, by including some or all of the derived the contextual information in a search request and/or in a sorting of search results obtained without the contextual information.
- the image 300 may be any stored or live preview image that includes text in an image context.
- the image 300 may be a rendered user interface of a media playback application
- the image text 312 may be a song title of a song being played back by the media playback application
- the foreground object 314 may be an album cover-art image
- the image text 318 may be an artist name and/or an album title.
- a search for the song name may be enhanced by using contextual information derived from the image 300 , such as the album title, the artist name, and/or the album art.
- information associated with the application displaying the image 300 may also be useful contextual information for enhancing a search for the image text 312 .
- a search for selected text corresponding to a song title may be enhanced by including information indicating a media playback application in the contextual information that informs the search.
- This example of an image text 312 being a song title can particularly illustrate the enhancement provided by including contextual information in the text search when considering that the song title may be “Butterfly”, just as the name of a restaurant can be “Butterfly”.
- contextual information indicating the media playback application or indicating an artist's name, an album name, an embedding of a cover art image, and/or an object type of an object displayed in the album cover art
- the obtained search results can be related to the song “Butterfly”, rather than a restaurant “Butterfly”, or the insect “Butterfly”.
- depth information may also be derived from the image 300 and used as contextual information for searching for the image text 312 .
- the depth information may be obtained using depth sensors (e.g., depth sensor(s) 216 ) of the computing device 110 while capturing the image 300 , or may be derived from the image 300 itself (e.g., using computer vision and/or other machine learning techniques to identify the relative depths of objects in an image).
- depth information derived for and/or from the image 300 may be used to determine that the foreground object 314 is a foreground object and/or that the background object 316 is a background object.
- a foreground object and/or an object nearer to the searched image text 312 may be weighted more heavily in aiding the search for the image text 312 than a background object or object that is relatively further from the searched image text 312 in the image 300 .
- the contextual information can be used to locally rank server search results for the selected text at the computing device 110 , and/or some of all of the contextual information can be sent to a server (e.g., a search server such as server 120 ) with the image text 312 text, to enhance the search results from the server.
- a server e.g., a search server such as server 120
- the server-obtained search results can also be displayed with locally generated dictionary results in one or more implementations.
- FIG. 4 illustrates an example in which search results for the image text 312 are presented by the computing device 110 .
- the search results for the image text 312 include local result(s) 400 , such as locally generated dictionary results obtained by searching for the image text 312 in a local dictionary stored at the computing device 110 , and obtaining a dictionary definition, a synonym, and antonym, or other dictionary entry for the image text 312 (e.g., a dictionary entry that is obtained without using contextual information).
- the search results for the image text 312 also include context-based server results 402 .
- the context-based server results 402 may be obtained by providing the image text 312 and contextual information for the image 300 to a server, such as the server 120 of FIG. 1 , and receiving search results generated based both on the image text 312 and the contextual information, or by providing only the image text 312 to the search server, receiving text-only based search results from the server, and sorting or re-ranking the text-only based search results using the contextual information (e.g., to move more relevant search results to the top of the presented list of search results).
- a server such as the server 120 of FIG. 1
- the context-based server results 402 may be obtained by providing the image text 312 and contextual information for the image 300 to a server, such as the server 120 of FIG. 1 , and receiving search results generated based both on the image text 312 and the contextual information, or by providing only the image text 312 to the search server, receiving text-only based search results from the server, and sorting or re-ranking the text-only based search results using the contextual information (
- FIG. 5 illustrates a flow diagram of an example process 500 for performing a contextual text lookup for text in an image, in accordance with one or more implementations.
- the process 500 is primarily described herein with reference to the computing device 110 of FIG. 1 .
- the process 500 is not limited to the computing device 110 , and one or more blocks (or operations) of the process 500 may be performed by one or more other components and/or other suitable devices.
- the blocks of the process 500 are described herein as occurring in serial, or linearly. However, multiple blocks of the process 500 may occur in parallel.
- the blocks of the process 500 need not be performed in the order shown and/or one or more blocks of the process 500 need not be performed and/or can be replaced by other operations.
- an electronic device e.g., computing device 110 that is displaying an image (e.g., image 300 ) may receive a request to perform a search for text (e.g., image text 312 ) in the image.
- receiving the request may include receiving, by the electronic device while displaying the image, a selection of the text in the image (e.g., a user selection of the text).
- receiving the request may include receiving the request from a user via a voice input to the application and/or to a voice assistant application running on the device.
- the image is a flat image (e.g., an array of pixel values without metadata indicating the contents of the image), and the process 500 also includes, prior to receiving the request: detecting, by the electronic device in the flat image while the flat image is displayed, the text; and modifying the display of the flat image to display the text as selectable text.
- the process 500 may also include obtaining the image from memory of the electronic device, from memory of a remote device, or from a camera of the electronic device, and displaying the image with the electronic device (e.g., with the display 302 , as shown in the example of FIG. 3 ).
- the electronic device may be detected the text in a flat image responsive to receiving a user interaction with the image, such as an attempt to select the text in the flat image.
- the electronic device may derive contextual information from the image that includes the text.
- deriving the contextual information may include, by the electronic device (e.g., by providing the image as input to one or more of machine learning model(s) 208 ), determining a label for an object (e.g., foreground object 314 , background object 316 , and/or any other image object) in the image.
- the contextual information may be derived from the image prior to receiving the request for the search, or responsive to receiving the request for the search.
- deriving the contextual information may include, by the electronic device (e.g., by providing the image as input to one or more of machine learning model(s) 208 ), obtaining an embedding of a region of interest in the image.
- deriving the contextual information may include, by the electronic device (e.g., by providing the image as input to one or more of machine learning model(s) 208 ), obtaining unselected text and/or unsearched text (e.g., image text 318 or other text not interacted with by the user in connection with the search request) from the image.
- deriving the contextual information may include, by the electronic device (e.g., by providing the image as input to one or more of machine learning model(s) 208 and/or by obtaining location information from a location sensor or process at the electronic device), determining a location associated with the image (e.g., a location at which the image was captured, such as from location metadata of the image and/or by identifying location-specific information, such as a street sign, in the image).
- a location associated with the image e.g., a location at which the image was captured, such as from location metadata of the image and/or by identifying location-specific information, such as a street sign, in the image.
- deriving the contextual information may include, by the electronic device (e.g., by providing the image as input to one or more of machine learning model(s) 208 ), obtaining depth information associated the image (e.g., from depth metadata captured using one or more depth sensors (e.g., depth sensors of sensor(s) 216 ) at the time the image was obtained, and/or by inferring relative depths of objects in the image from the image itself).
- deriving the contextual information may include identifying and/or distinguishing one or more foreground objects (e.g., foreground object 314 ) from one or more background objects (e.g., background object 316 ).
- the electronic device may obtain, responsive to the request, one or more search results based on the text and the contextual information.
- obtaining the one or more search results includes providing the text from the electronic device to a server (e.g., server 120 ), receiving a ranked set of search results from the server at the electronic device, and re-ranking the ranked set of search results based on the contextual information to generate the one or more search results for output by the electronic device.
- the ranked set of search results may be a set of search results that is ranked and/or ordered according to a server-determined relevance to the text.
- relevance to the text alone may not coincide with relevance to the user's desired information about the searched text from a displayed image.
- the contextual information derived from the image may be used to re-rank and/or reorder the set of search results received from the server to place search results most relevant to the searched text and one or more contextual aspects of the image at the top of the displayed/output set of search results.
- obtaining the one or more search results may include providing the text and the contextual information from the electronic device to a server (e.g., the server 120 ), and receiving the one or more search results from the server.
- the one or more search results from the server may already be ranked and/or ordered for relevance based on the contextual information (e.g., without performing a context-based re-ranking at the electronic device).
- the contextual information may include application information (e.g., an application identifier, such as application ID 308 , application text associated with application controls, such as application controls 310 , and/or any other information indicating a particular application or type of application) for an application by which the image is displayed.
- obtaining the one or more search results may include obtaining, by the electronic device, application information for an application by which the image is displayed; and obtaining the one or more search results based on the text, the contextual information, and the application information.
- the computing device 110 may determine that the requested search relates to and/or is the same as some or all of the text display in the image, and then obtain search results for the requested search using the contextual information derived from the image based on that determination.
- the electronic device may provide the one or more search results for output by the electronic device.
- providing the one or more search results for output may include providing the one or more search results for display by a display (e.g., display 302 ) of the electronic device (e.g., as described herein in connection with FIG. 4 ).
- FIG. 6 illustrates a flow diagram of another example process 600 for performing a contextual text lookup for text in an image, in accordance with one or more implementations.
- the process 600 is primarily described herein with reference to the computing device 110 of FIG. 1 .
- the process 600 is not limited to the computing device 110 , and one or more blocks (or operations) of the process 600 may be performed by one or more other components and/or other suitable devices.
- the blocks of the process 600 are described herein as occurring in serial, or linearly. However, multiple blocks of the process 600 may occur in parallel.
- the blocks of the process 600 need not be performed in the order shown and/or one or more blocks of the process 600 need not be performed and/or can be replaced by other operations.
- an electronic device may receive a request to perform a search for text (e.g., image text 312 ) in an image (e.g., image 300 ) displayed by an application running on the electronic device.
- receiving the request may include receiving, by the electronic device while displaying the image, a selection of the text in the image (e.g., a user selection of the text).
- receiving the request may include receiving the request from a user via a voice input to the application and/or to a voice assistant application running on the device.
- the electronic device may obtain, responsive to the request, application information for the application.
- the application information may include an application identifier, such as application ID 308 , application text associated with application controls, such as application controls 310 , and/or any other information indicating a particular application or type of application.
- the application information may include an application type (e.g., media player, browser, camera, or other type).
- the application information includes a file type of a file accessed by the application and associated with the image (e.g., an audio file type having an associated album artwork image, or a video file type having an associated cover or poster artwork image).
- the application information may be obtained from the image prior to receiving the request for the search, or responsive to receiving the request for the search.
- the electronic device may provide the one or more search results for output by the electronic device.
- providing the one or more search results for output may include providing the one or more search results for display by a display (e.g., display 302 ) of the electronic device (e.g., as described herein in connection with FIG. 4 ).
- FIG. 7 illustrates a flow diagram of an example process 700 for performing a contextual text lookup for text in an image at a server, in accordance with one or more implementations.
- the process 700 is primarily described herein with reference to the server 120 of FIG. 1 .
- the process 700 is not limited to the computing server 120 , and one or more blocks (or operations) of the process 700 may be performed by one or more other components and/or other suitable devices.
- the blocks of the process 700 are described herein as occurring in serial, or linearly. However, multiple blocks of the process 700 may occur in parallel.
- the blocks of the process 700 need not be performed in the order shown and/or one or more blocks of the process 700 need not be performed and/or can be replaced by other operations.
- policies and practices should be adapted for the particular types of personal information data being collected and/or accessed and adapted to applicable laws and standards, including jurisdiction-specific considerations which may serve to impose a higher standard. For instance, in the US, collection of or access to certain health data may be governed by federal and/or state laws, such as the Health Insurance Portability and Accountability Act (HIPAA); whereas health data in other countries may be subject to other regulations and policies and should be handled accordingly.
- HIPAA Health Insurance Portability and Accountability Act
- the present disclosure also contemplates embodiments in which users selectively block the use of, or access to, personal information data. That is, the present disclosure contemplates that hardware and/or software elements can be provided to prevent or block access to such personal information data.
- the present technology can be configured to allow users to select to “opt in” or “opt out” of participation in the collection of personal information data during registration for services or anytime thereafter.
- the present disclosure contemplates providing notifications relating to the access or use of personal information. For instance, a user may be notified upon downloading an app that their personal information data will be accessed and then reminded again just before personal information data is accessed by the app.
- personal information data should be managed and handled in a way to minimize risks of unintentional or unauthorized access or use. Risk can be minimized by limiting the collection of data and deleting data once it is no longer needed.
- data de-identification can be used to protect a user's privacy. De-identification may be facilitated, when appropriate, by removing identifiers, controlling the amount or specificity of data stored (e.g., collecting location data at city level rather than at an address level), controlling how data is stored (e.g., aggregating data across users), and/or other methods such as differential privacy.
- the present disclosure broadly covers use of personal information data to implement one or more various disclosed embodiments, the present disclosure also contemplates that the various embodiments can also be implemented without the need for accessing such personal information data. That is, the various embodiments of the present technology are not rendered inoperable due to the lack of all or a portion of such personal information data.
- the bus 808 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 800 .
- the bus 808 communicatively connects the one or more processing unit(s) 812 with the ROM 810 , the system memory 804 , and the permanent storage device 802 . From these various memory units, the one or more processing unit(s) 812 retrieves instructions to execute and data to process in order to execute the processes of the subject disclosure.
- the one or more processing unit(s) 812 can be a single processor or a multi-core processor in different implementations.
- the ROM 810 stores static data and instructions that are needed by the one or more processing unit(s) 812 and other modules of the electronic system 800 .
- the permanent storage device 802 may be a read-and-write memory device.
- the permanent storage device 802 may be a non-volatile memory unit that stores instructions and data even when the electronic system 800 is off.
- a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) may be used as the permanent storage device 802 .
- the bus 808 also connects to the input and output device interfaces 814 and 806 .
- the input device interface 814 enables a user to communicate information and select commands to the electronic system 800 .
- Input devices that may be used with the input device interface 814 may include, for example, alphanumeric keyboards and pointing devices (also called “cursor control devices”).
- the output device interface 806 may enable, for example, the display of images generated by electronic system 800 .
- Output devices that may be used with the output device interface 806 may include, for example, printers and display devices, such as a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, a flexible display, a flat panel display, a solid state display, a projector, or any other device for outputting information.
- printers and display devices such as a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, a flexible display, a flat panel display, a solid state display, a projector, or any other device for outputting information.
- One or more implementations may include devices that function as both input and output devices, such as a touchscreen.
- feedback provided to the user can be any form of sensory feedback, such as visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
- the computer-readable storage medium can be any storage medium that can be read, written, or otherwise accessed by a general purpose or special purpose computing device, including any processing electronics and/or processing circuitry capable of executing instructions.
- the computer-readable medium can include any volatile semiconductor memory, such as RAM, DRAM, SRAM, T-RAM, Z-RAM, and TTRAM.
- the computer-readable medium also can include any non-volatile semiconductor memory, such as ROM, PROM, EPROM, EEPROM, NVRAM, flash, nvSRAM, FeRAM, FeTRAM, MRAM, PRAM, CBRAM, SONOS, RRAM, NRAM, racetrack memory, FJG, and Millipede memory.
- the computer-readable storage medium can include any non-semiconductor memory, such as optical disk storage, magnetic disk storage, magnetic tape, other magnetic storage devices, or any other medium capable of storing one or more instructions.
- the tangible computer-readable storage medium can be directly coupled to a computing device, while in other implementations, the tangible computer-readable storage medium can be indirectly coupled to a computing device, e.g., via one or more wired connections, one or more wireless connections, or any combination thereof.
- Instructions can be directly executable or can be used to develop executable instructions.
- instructions can be realized as executable or non-executable machine code or as instructions in a high-level language that can be compiled to produce executable or non-executable machine code.
- instructions also can be realized as or can include data.
- Computer-executable instructions also can be organized in any format, including routines, subroutines, programs, data structures, objects, modules, applications, applets, functions, etc. As recognized by those of skill in the art, details including, but not limited to, the number, structure, sequence, and organization of instructions can vary significantly without varying the underlying logic, function, processing, and output.
- any specific order or hierarchy of blocks in the processes disclosed is an illustration of example approaches. Based upon design preferences, it is understood that the specific order or hierarchy of blocks in the processes may be rearranged, or that all illustrated blocks be performed. Any of the blocks may be performed simultaneously. In one or more implementations, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
- base station As used in this specification and any claims of this application, the terms “base station”, “receiver”, “computer”, “server”, “processor”, and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people.
- display or “displaying” means displaying on an electronic device.
- the phrase “at least one of” preceding a series of items, with the term “and” or “or” to separate any of the items, modifies the list as a whole, rather than each member of the list (i.e., each item).
- the phrase “at least one of” does not require selection of at least one of each item listed; rather, the phrase allows a meaning that includes at least one of any one of the items, and/or at least one of any combination of the items, and/or at least one of each of the items.
- phrases “at least one of A, B, and C” or “at least one of A, B, or C” each refer to only A, only B, or only C; any combination of A, B, and C; and/or at least one of each of A, B, and C.
- a processor configured to monitor and control an operation or a component may also mean the processor being programmed to monitor and control the operation or the processor being operable to monitor and control the operation.
- a processor configured to execute code can be construed as a processor programmed to execute code or operable to execute code.
- phrases such as an aspect, the aspect, another aspect, some aspects, one or more aspects, an implementation, the implementation, another implementation, some implementations, one or more implementations, an embodiment, the embodiment, another embodiment, some implementations, one or more implementations, a configuration, the configuration, another configuration, some configurations, one or more configurations, the subject technology, the disclosure, the present disclosure, other variations thereof and alike are for convenience and do not imply that a disclosure relating to such phrase(s) is essential to the subject technology or that such disclosure applies to all configurations of the subject technology.
- a disclosure relating to such phrase(s) may apply to all configurations, or one or more configurations.
- a disclosure relating to such phrase(s) may provide one or more examples.
- a phrase such as an aspect or some aspects may refer to one or more aspects and vice versa, and this applies similarly to other foregoing phrases.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Library & Information Science (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (20)
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/973,500 US12386886B2 (en) | 2022-04-29 | 2022-10-25 | Contextual text lookup for images |
| CN202380037259.0A CN119110946A (en) | 2022-04-29 | 2023-04-27 | Contextual text search for images |
| KR1020247038538A KR20250002579A (en) | 2022-04-29 | 2023-04-27 | Lookup contextual text for an image |
| PCT/US2023/020280 WO2023212248A1 (en) | 2022-04-29 | 2023-04-27 | Contextual text lookup for images |
| EP23726724.0A EP4500367A1 (en) | 2022-04-29 | 2023-04-27 | Contextual text lookup for images |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263336987P | 2022-04-29 | 2022-04-29 | |
| US17/973,500 US12386886B2 (en) | 2022-04-29 | 2022-10-25 | Contextual text lookup for images |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230350941A1 US20230350941A1 (en) | 2023-11-02 |
| US12386886B2 true US12386886B2 (en) | 2025-08-12 |
Family
ID=88512139
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/973,500 Active US12386886B2 (en) | 2022-04-29 | 2022-10-25 | Contextual text lookup for images |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US12386886B2 (en) |
| EP (1) | EP4500367A1 (en) |
| KR (1) | KR20250002579A (en) |
| CN (1) | CN119110946A (en) |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090287669A1 (en) * | 2008-05-13 | 2009-11-19 | Bennett James D | Image search engine using context screening parameters |
| US20120254076A1 (en) * | 2011-03-30 | 2012-10-04 | Microsoft Corporation | Supervised re-ranking for visual search |
| US20120269116A1 (en) * | 2011-04-25 | 2012-10-25 | Bo Xing | Context-aware mobile search based on user activities |
| US20150033164A1 (en) * | 2013-07-26 | 2015-01-29 | Samsung Electronics Co., Ltd. | Method and apparatus for providing graphic user interface |
| US20160019618A1 (en) * | 2013-05-13 | 2016-01-21 | A9.Com, Inc | Augmented reality recommendations |
| US20160162591A1 (en) * | 2014-12-04 | 2016-06-09 | Microsoft Technology Licensing, Llc | Web Content Tagging and Filtering |
| US9773102B2 (en) * | 2011-09-09 | 2017-09-26 | Microsoft Technology Licensing, Llc | Selective file access for applications |
| US20200250453A1 (en) * | 2019-01-31 | 2020-08-06 | Adobe Inc. | Content-aware selection |
| US20210208741A1 (en) * | 2017-09-13 | 2021-07-08 | Google Llc | Efficiently augmenting images with related content |
| US20220138404A1 (en) * | 2013-08-12 | 2022-05-05 | Microsoft Technology Licensing, Llc | Browsing images via mined hyperlinked text snippets |
| US20220335242A1 (en) * | 2021-04-19 | 2022-10-20 | Apple Inc. | Visual search for electronic devices |
-
2022
- 2022-10-25 US US17/973,500 patent/US12386886B2/en active Active
-
2023
- 2023-04-27 CN CN202380037259.0A patent/CN119110946A/en active Pending
- 2023-04-27 KR KR1020247038538A patent/KR20250002579A/en active Pending
- 2023-04-27 EP EP23726724.0A patent/EP4500367A1/en active Pending
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090287669A1 (en) * | 2008-05-13 | 2009-11-19 | Bennett James D | Image search engine using context screening parameters |
| US20120254076A1 (en) * | 2011-03-30 | 2012-10-04 | Microsoft Corporation | Supervised re-ranking for visual search |
| US20120269116A1 (en) * | 2011-04-25 | 2012-10-25 | Bo Xing | Context-aware mobile search based on user activities |
| US9773102B2 (en) * | 2011-09-09 | 2017-09-26 | Microsoft Technology Licensing, Llc | Selective file access for applications |
| US20160019618A1 (en) * | 2013-05-13 | 2016-01-21 | A9.Com, Inc | Augmented reality recommendations |
| US20150033164A1 (en) * | 2013-07-26 | 2015-01-29 | Samsung Electronics Co., Ltd. | Method and apparatus for providing graphic user interface |
| US20220138404A1 (en) * | 2013-08-12 | 2022-05-05 | Microsoft Technology Licensing, Llc | Browsing images via mined hyperlinked text snippets |
| US20160162591A1 (en) * | 2014-12-04 | 2016-06-09 | Microsoft Technology Licensing, Llc | Web Content Tagging and Filtering |
| US20210208741A1 (en) * | 2017-09-13 | 2021-07-08 | Google Llc | Efficiently augmenting images with related content |
| US20200250453A1 (en) * | 2019-01-31 | 2020-08-06 | Adobe Inc. | Content-aware selection |
| US20220335242A1 (en) * | 2021-04-19 | 2022-10-20 | Apple Inc. | Visual search for electronic devices |
Non-Patent Citations (5)
| Title |
|---|
| Cooke, "How to use Google Lens (How to get it and what it can do)," Oct. 1, 2020, retrieved from https://www.youtube.com/watch?v=ymSTtysmC6l, transcript, 39 pages. |
| Cooke, "How to Use Google Lens," Genealogy Gems, Oct. 3, 2020, retrieved from https://lisalouisecooke.com/2020/10/03/how-to-use-google-lens. |
| Daniel Carlos Guimarães Pedronette et al., "Exploiting contextual information for image re-ranking and rank aggregation", Mar. 13, 2012, Springer, pp. 115-128 (Year: 2012). * |
| European Office Action from European Patent Application No. 23726724.0, dated Jun. 18, 2025, 8 pages. |
| International Search Report and Written Opinion from PCT/US2023/020280, dated Jul. 12, 2023. 12 pages. |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4500367A1 (en) | 2025-02-05 |
| US20230350941A1 (en) | 2023-11-02 |
| KR20250002579A (en) | 2025-01-07 |
| CN119110946A (en) | 2024-12-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10061985B2 (en) | Video understanding platform | |
| US11580144B2 (en) | Search indexing using discourse trees | |
| US10042892B2 (en) | Question answer system using physical distance data | |
| US10552759B2 (en) | Iterative classifier training on online social networks | |
| US9275272B2 (en) | Tag suggestions for images on online social networks | |
| US20190188285A1 (en) | Image Search with Embedding-based Models on Online Social Networks | |
| US9684695B2 (en) | Ranking test framework for search results on an online social network | |
| US20140280093A1 (en) | Social entity previews in query formulation | |
| CN108292309A (en) | Use deep learning Model Identification content item | |
| US12013906B2 (en) | Client-side personalization of search results | |
| US20150072335A1 (en) | System and method for providing augmentation based learning content | |
| US20240242499A1 (en) | Visual search for electronic devices | |
| US20220382803A1 (en) | Syndication of Secondary Digital Assets with Photo Library | |
| US11693541B2 (en) | Application library and page hiding | |
| US11681718B2 (en) | Scoping a system-wide search to a user-specified application | |
| US12282728B2 (en) | Automatic text recognition with layout preservation | |
| US12386886B2 (en) | Contextual text lookup for images | |
| US11080349B1 (en) | Geo-encoded embeddings | |
| WO2023212248A1 (en) | Contextual text lookup for images | |
| US9135313B2 (en) | Providing a search display environment on an online resource | |
| US20220391434A1 (en) | Centralized on-device image search | |
| US20240403478A1 (en) | Privacy-preserving presentation of content item bundles | |
| WO2023233204A1 (en) | Automatic text recognition with layout preservation | |
| WO2024253974A1 (en) | Privacy-preserving presentation of content item bundles |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: APPLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZHAO, YANG;SHAH, PULAH J.;DIXON, RYAN S.;SIGNING DATES FROM 20221012 TO 20221020;REEL/FRAME:061652/0011 |
|
| STCV | Information on status: appeal procedure |
Free format text: APPEAL BRIEF (OR SUPPLEMENTAL BRIEF) ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |