[go: up one dir, main page]

HK1248329B - Computer-implemented method and system for disambiguating travel queries - Google Patents

Computer-implemented method and system for disambiguating travel queries Download PDF

Info

Publication number
HK1248329B
HK1248329B HK18107474.0A HK18107474A HK1248329B HK 1248329 B HK1248329 B HK 1248329B HK 18107474 A HK18107474 A HK 18107474A HK 1248329 B HK1248329 B HK 1248329B
Authority
HK
Hong Kong
Prior art keywords
travel
items
cluster
item
travel items
Prior art date
Application number
HK18107474.0A
Other languages
Chinese (zh)
Other versions
HK1248329A1 (en
Inventor
翁德雷‧琳达
K‧M‧陈
普拉沙恩斯‧科特‧普拉卡萨姆
阿南斯‧林加姆内尼
沙恩‧威廉‧迈里克
桑瓦‧西姆弗卡威
Original Assignee
Expedia, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/811,446 external-priority patent/US10360276B2/en
Application filed by Expedia, Inc. filed Critical Expedia, Inc.
Publication of HK1248329A1 publication Critical patent/HK1248329A1/en
Publication of HK1248329B publication Critical patent/HK1248329B/en

Links

Description

Computer-implemented method and system for disambiguating travel queries
Background
Computing devices and computing networks are often used by users to obtain information about various topics. Many network-based systems, because of the wide range of information provided, enable users to search for information on these systems. Typically, a user is able to submit a structured or unstructured query to a network-based system, which then attempts to match it to a set of potential search results. Unstructured queries (e.g., free-form text strings) are often popular with users due to their ease of use. However, due to their unstructured nature, these queries are often ambiguous, in part, and therefore result in at least partially irrelevant search results being provided to the user.
For example, a user interested in locating information about hotel accommodations in Springfield, stiffield, missouri, may search for network-based services using the query string "Hotel in Spunfield". However, the web-based service may not be able to determine from the text of the query which city the "Springfield" of interest to the user is specific (e.g., springfield, missouri, springfield, massachusetts, springfield, oregon, etc.), or whether the "Springfield" completely represents a city in the United states. Thus, merely performing a query and providing a set of results is likely to present irrelevant information to the user, resulting in a reduction in user satisfaction with the system and an increase in use of the computing resources required to service the user.
Drawings
FIG. 1 is a schematic block diagram of an illustrative network environment in which a travel service may be used to enable a user to submit a query for travel items and disambiguate the user query based on geographic clustering of results;
FIG. 2 is an illustrative block diagram of a query processing system included in the travel service of FIG. 1 that may process a user query to detect potential ambiguities based on geographic clustering of results;
FIG. 3 is an illustrative user interface that may be displayed on a user computing device to inform a user of a potentially ambiguous query and to provide results of the query based on detected ambiguities;
FIG. 4 is an illustrative visualization of geographic clusters of search results, which may indicate ambiguities in respective queries; and
fig. 5A and 5B are flow diagrams for processing a user-submitted query that may include ambiguities, for detecting such ambiguities based on geo-clustering of results, and for providing the results of the query to the user.
Detailed Description
In general, aspects of the present disclosure are directed to processing a query submitted by a user that may include ambiguous terms or search terms, and for resolving ambiguities to provide meaningful and accurate search results to the user. More specifically, aspects of the present disclosure utilize geo-clustering of search results based on the location of goods or services (such as hotel accommodations) to determine a set of potential regions or zones in which valid results may be located, and notify consumers of the various regions or zones. Illustratively, a user of a U.S. based web travel service may submit a query for "hotel by Springfield". The travel service may determine that the valid results of the query may include a Springfield of Missouri, a Springfield of Massachusetts, or a Springfield of Oregon. Thus, the user may be prompted to select the correct location for the desired hotel stay. As will be described below, the travel service may determine a set of regions that include potentially valid results without literally mapping the terms of the query to a particular region (e.g., without detecting that "Springfield" means that a city and the city of the name exist in multiple states). In this way, the travel service may be able to disambiguate any free-form query. For example, a query for "beach hotels" may result in potential results being detected in various locations with beaches (e.g., state Hawaii (Hawaii), state California (California), state Florida (Florida), etc.). Similarly, a query for "Edgewater hotel" may result in the detection of potential hotel accommodations in Seattle (WA), a hotel location named "Edgewater hotel," in washington and in the city Edgewater, NJ. As described in detail below, the travel service may thus be able to more quickly and accurately process user queries, thereby improving the user experience and reducing the computing resources required to utilize the network travel service.
To determine the set of regions or regions in which potentially valid results for a query are located, the travel service may utilize result clustering. In particular, the travel service may first execute a query to determine a set of results (e.g., according to an algorithm described below, a conventional search algorithm known in the art, or any other search algorithm). The travel service may then determine the geographic location of each search result. For example, the travel service may assign a location to the results using known address data or latitude data and longitude data for each of the result sets. The travel service may then cluster the results according to their assigned locations. In one embodiment, the results may be clustered based on a simple distance calculation, such that two results within a threshold distance of each other (which may be predefined by the travel service) are placed within the same "cluster," while results that are more than the threshold distance away from any existing cluster are placed in a new cluster. Other clustering algorithms, such as "k-means" clustering, are known in the art and may be used to determine clustering of results.
After determining the set of clusters of results, the travel service can assign an identifier to the cluster. In one embodiment, the identifier may be based on existing metadata about each result in the cluster. For example, the travel service may maintain multiple geographic identifiers for each result that specify a city, county, region, state, country, etc. for each result. Thus, each cluster can be identified based on the most specific geographic identifier shared by all results in the cluster. Thus, clusters of results across seattle, washington may be identified as "hotels in seattle, washington" while clusters of results spread across Rhode Island (Rhode Island) may be identified as "hotels in Rhode Island".
Thus, information about the determined set of clusters may be presented to the user in response to the user's query. For example, a query for "hotels in Springs" may cause the travel service to return an indication that the potential result is located in Springs, oregon, or Masrings, and cause the travel service to request that the user select one of those regions in order to display the hotel accommodations in the selected region. Thus, a user who originally intended to search for a hotel stay in Springfield, oregon, would not be presented with a hotel stay for Springfield, missouri (which might otherwise be more prominently displayed in the collection of search results). In addition, because result-based clustering has disambiguated queries, the travel service does not have to attempt to implement a separate system to detect ambiguities in search criteria before conducting a search. Cluster-based disambiguation thus presents a significant improvement over other disambiguation techniques (e.g., scanning each query to detect a predetermined list of potentially ambiguous terms or phrases).
In some embodiments, the travel service may be configured to distinguish between clusters that may represent areas of interest to the user and clusters that may include particular items of interest to the user. For example, a query for "Edgewater hotel" may result in two clusters: the first includes the Edgewater hotel in seattle, washington and other nearby hotels in seattle, and the second includes various hotels in Edgewater, new jersey. Although each hotel in the first cluster may be accurately described as a "hotel in seattle, washington," such a description may not be deemed useful to users searching for a particular "Edgewater hotel" (because such results may appear more ambiguous than the initial search). Thus, the travel service may detect that the Edgewater hotel included in the cluster is a particular item of interest to the user, rather than presenting the user with the region corresponding to the first cluster. Thus, the travel service may present the user with a first cluster corresponding to the seattle Edgewater hotel and allow the user to select the first cluster to view the hotel accommodations at the Edgewater hotel.
In some cases, in addition to or instead of explicitly causing the user to select clusters, the user history or other user profile data of the user may be used to distinguish clusters. For example, where a travel service has a record that a user previously purchased in a particular geographic region on a particular date, the travel service may automatically disambiguate additional queries for the user on that particular date by selecting clusters corresponding to that geographic region. As another example, where a travel service has a user's preference record for a particular region (e.g., an explicit indication of the preference for that region as indicated by repeated past purchases, etc.), the travel service may automatically disambiguate the user's query by selecting a cluster corresponding to the particular region. In some cases, rather than automatically selecting the cluster corresponding to the preferred region, the travel service may prioritize the clusters using the user's preference for the region (e.g., such that the cluster corresponding to the preferred region is displayed above the alternative clusters).
One of ordinary skill in the art will appreciate that the use of geo-clustering to disambiguate a search query represents a significant advantage over previous implementations. In particular, the use of geo-clustering enables a reduction in the number of irrelevant results presented to a user, thereby enabling the user to more quickly and efficiently search for and locate items of interest. Thus, using geo-clustering to disambiguate queries can enable a travel service to run more efficiently, enabling the travel service to return relevant query results more quickly and with less computing resources. Moreover, disambiguating a search query using geo-clustering does not require independent analysis of the query itself (e.g., by attempting to match query terms to a known list of potentially ambiguous terms), but rather utilizes various search results that match the query in order to detect ambiguity. Thus, the embodiments described herein represent a significant advance over existing systems.
Although examples are provided herein with respect to particular types of travel services, such as hotel accommodations, embodiments of the present disclosure may be applied to any geographic-based item or service, including but not limited to flights, accommodations, other transportation, activities, tours, travel insurance, day trips, destination services, or combinations thereof.
FIG. 1 is a block diagram depicting an illustrative operating environment in which a web-based travel service 150 enables a customer to browse, search for, and retrieve travel items offered by third party vendors or the operator of the travel service 150. As shown in FIG. 1, the operating environment includes one or more reservation systems 120 and one or more user computing devices 110 in communication with a network-based travel service 150 via a network 130. A third party provider using reservation system 120 may make travel items or information about travel items available to travel service 150 via network 130. Travel service 150 may then make the travel item, as well as other travel items, available to traveler computing device 110. Thus, a future traveler using user computing device 110 may browse travel items available in travel service 150, search for travel items, and obtain, book, or reserve one or more desired travel items.
The user computing device 110 may be any computing device, such as a laptop or tablet computer, a personal computer, a server, a Personal Digital Assistant (PDA), a hybrid PDA/mobile phone, a mobile phone, an e-book reader, a set-top box, a camera, a digital media player, and so forth. The reservation system 120 and the user computing device 110 may communicate with the travel service 150 via the network 130. Those skilled in the art will appreciate that the network 130 may be any wired network, wireless network, or combination thereof. Further, the network 130 may be a personal area network, a local area network, a wide area network, a cable network, a satellite network, a cellular telephone network, or a combination thereof. In the illustrated embodiment, the network 130 is the internet. Protocols and components for communicating via the internet or any other above-mentioned type of communication network are known to those skilled in the art of computer communication and therefore need not be described in more detail here.
The reservation system 120 may correspond to any system or device configured to or capable of allowing reservations, or acquisition of travel items. For example, the reservation system 120 may correspond to a Central Reservation System (CRS), a Global Distribution System (GDS), or any other system in which multiple travel item providers (e.g., airlines, hotels, car rental agencies, cruise ships, bus services, etc.) make travel items available for reservation, and/or purchase. In other embodiments, the reservation system 120 may correspond to a system provided by a provider of personal travel items (e.g., a particular airline, hotel or hotel chain, car rental company, cruise ship, bus service, etc.). In general, each reservation system may cause other network-based devices (e.g., devices of travel service 150) to query for information about travel items (e.g., availability, price, travel plans, etc.), search for travel items, and reserve, acquire, or reserve travel items. The operation of the booking system is known in the art and will therefore not be described in detail herein.
In the illustrated embodiment, travel service 150 is shown as a computer environment including personal computer systems interconnected using one or more networks. More specifically, the travel service 150 may include a user interface system 156, a reservation system interface 152, a query processing system 154, a traveler profile data store 158, and a travel item data store 160. Although shown as distinct systems in FIG. 1, in some embodiments, one or more of the user interface system 156, reservation system interface 152, query processing system 154, traveler profile data store 158, and travel item data store 160 can be combined into one or more aggregation systems. Further, those skilled in the art will appreciate that the travel service 150 may have fewer or more components than those shown in FIG. 1, including various Web services and/or peer-to-peer network configurations. In some embodiments, one or more components of travel service 150 may be implemented by a virtual machine implemented in a hosted computing environment. The managed computing environment may include one or more rapidly provisioned and released computing resources, which may include computing, networking, and/or storage devices. A managed computing environment may also be referred to as a cloud computing environment. Accordingly, the depiction of the travel service 150 of FIG. 1 may be considered exemplary and not limiting to the present disclosure.
Any one or more of the user interface system 156, reservation system interface 152, query processing system 154, traveler profile data store 158, and travel item data store 160 can be embodied in a plurality of components, each of which executes an instance of a respective one of the user interface system 156, reservation system interface 152, query processing system 154, traveler profile data store 158, and travel item data store 160. The servers or other computing components that implement any of the user interface system 156, reservation system interface 152, query processing system 154, traveler profile data store 158, and travel item data store 160 can include network interfaces, memories, processing units, and computer-readable media drives, all of which can communicate with each other via a communications bus. The network interface may provide a connection to the network 130 and/or other networks or computer systems. The processing unit may be in communication with a memory containing program instructions that the processing unit executes in order to operate a corresponding user interface system 156, reservation system interface 152, query processing system 154, traveler data storage 158, and travel item data storage 160. The memory may generally include RAM, ROM, other persistent and secondary memory, and/or any non-transitory computer-readable medium.
In accordance with embodiments of the present disclosure, the query processing system 154 may be configured to process the query and determine a set of query results (e.g., presented to the user computing device 110 via the user interface system 156). More specifically, the query processing system 154 may utilize the geographic-based clusters to detect ambiguous queries and present an indicator of each result cluster to the user computing device 110 (e.g., as an identifier of the geographic area in which the cluster is located or an identifier of a particular item in the cluster that may be of interest to the user). As described below, the query processing system may perform additional pre-or post-processing on the query to increase the accuracy of the results.
In some embodiments, the query processing system 154 may utilize information obtained from the reservation system 120 about travel items. Interaction with the reservation system 120 can be facilitated through a reservation system interface 152, and the reservation system interface 152 can enable querying of information on the reservation system 120 about related travel items, retrieving information about travel items from the reservation system 120, and reserving travel items for the benefit of a user. In some embodiments, multiple reservation system interfaces 152 may be provided, each configured to interact with one or more particular reservation systems 120. For example, a first reservation system interface 152 may interact with an airline-based reservation system 120, while a second reservation system interface 152 may interact with a hotel-based reservation system 120. Embodiments of SYSTEMs AND METHODs FOR interacting with the reservation SYSTEM 120 are described in U.S. patent application No.12/470,442, filed on 21/5/2009 AND entitled "OPTIMIZED SYSTEM AND METHOD FOR FINDING an optimal FARE (OPTIMIZED SYSTEM AND METHOD) AND incorporated herein by reference in its entirety.
The query processing system 154 may also utilize information from the travel item data store 160, which may include various data about related travel items. The travel item data store may correspond to any physical data store, collection of physical data stores, or virtual data store implemented by one or more physical data stores, such as a Hard Disk Drive (HDD), a Solid State Drive (SSD), a tape drive, a Network Attached Storage (NAS), or any other persistent or substantially persistent storage component. In some cases, data (e.g., name, address, types of reservations available, etc.) in the travel item data store 160 may be obtained from the reservation system 120 via the reservation system interface 152. The travel item data store 160 may also include information generated by the travel service 150 itself or submitted to the travel service 150 by the user computing device 110, such as ratings, pictures, or opinions of various travel items. The travel item data store 160 may also include information retrieved or submitted by third party services, such as a separate travel item rating service or travel agency (not shown in FIG. 1). In some cases, due to communication delays with the reservation system 120, the query processing system 154 may utilize information from the travel item data store 160 exclusively or preferentially over information retrieved directly from the reservation system 120. As such, in some cases, the travel item data store 160 can be used to cache information from the reservation system 120 or a different system (not shown in FIG. 1) for use by the query processing system 154.
The submission to the query processing system 154 may be submitted by the user computing device 110 via the user interface system 156, which the user interface system 156 may also send the query results to the user computing device 110. Thus, the user interface system 156 may assist the user in searching, browsing, and retrieving (e.g., by booking, reserving, etc.) travel items via the user computing device 110. In some embodiments, the user interface system 156 may include a network server (e.g., a web server) for generating instructions for presentation by a recipient device of a network page (e.g., a web page) that facilitates such searching, browsing, and retrieval. One example of a user interface for which the user interface system 156 may generate instructions will be described in more detail below in FIG. 3.
The user interface system 156 may also be configured to store, retain, and retrieve information from the traveler profile data store 166. The user information data store 166 may correspond to any persistent or substantially persistent data store, such as one or more Hard Disk Drives (HDDs), solid State Drives (SSDs), or Network Attached Storage (NAS). The traveler profile data store 166 can store information about the user (e.g., user name, age, address, date of birth, credit card information, purchase history and travel reservations, frequent flyer or rewards program information, etc.) for use by the travel service 150. As discussed below, information from the traveler profile data store 166 can be used to determine one or more preferred regions of the user submitting the query, which can help disambiguate the query.
More detail regarding the query processing system 154 is shown in FIG. 2, which FIG. 2 is a block diagram depicting exemplary components of the query processing system 154. Each of the components 202-210 may represent a different computing device within the query processing system 154 (including at least one physical processor configured to execute computer-executable instructions), or may represent a program, function, code module, or other code executed by at least one processor of one or more computing devices included within the query processing system 154. For purposes of illustration, these components 202-210 will be discussed with respect to processing of an example query that is input (e.g., by the user computing device 110) to the query processing system 154.
The request received by the query processing system 154 may be initially communicated to the query annotator 202, where individual items of the query may be examined and annotated for later use within the query processing system. In some embodiments, query annotator 202 can utilize natural language processing techniques (e.g., as known in the art) to determine whether any term or series of terms (sometimes referred to as an "n-gram") can be classified as a predefined term type. By way of illustration, the term types can include (but are not limited to): the type of travel item sought (e.g., hotel, car, flight, activity, etc.), the name of the particular item sought (e.g., particular hotel name, hotel brand, etc.), the geographic location of the item sought (e.g., city, county, etc.), a geographic descriptor of the geographic location (e.g., beach, mountain range, etc.), the amenities offered by the desired travel item (e.g., swimming pool, hot water bath, catering, fitness center, spa, etc.), a service level descriptor (e.g., luxury, business, economy, etc.), or a particular part of speech (e.g., noun, verb, participle, pronoun, preposition, adverb, conjunctive, etc.). As described below, the query processing system 154 may use the term annotations to determine which terms should be queried at a given stage within the query processing system 154. For example, the query processing system 154 may ignore desired amenities (e.g., swimming pools, hydrotherapy centers, catering) when attempting to locate a geographic region that may include searched travel items because the terms are unlikely to be geographically meaningful. As another example, the query processing system 154 may ignore certain parts of speech that are not likely to be meaningful during processing of the query, such as prepositions.
The query processing system 154 may then transmit the annotated query to a search engine 204, which search engine 204 is operable to locate travel items potentially relevant to the query. In one embodiment, the search engine 204 may rank and classify potential travel items with simple term matching in response to a query, such that travel items whose descriptions more closely match the terms of the query are returned in response to the query and ranked based on their relevant descriptions and the level of correspondence between the terms. In another embodiment, the search engine 204 may use historical user activity data (e.g., "click-through" data) to determine travel items that are responsive to a given query. For example, for a given hotel query, the search engine 204 may determine a set of hotels that the previous user selected when utilizing the same query or terms within the same query. The search engine 204 may also rank the set of hotels based on the frequency of previous user selections. In one embodiment, the search engine 204 may associate the search term with a particular travel item by using a naive Bayesian classifier (derived Bayesian classifier). Using clickstream data to determine relevant query results may enable the search engine 204 to identify relevant results even when the search query does not share terms with the resulting travel items. For example, although the term "Big Apple (New York city)" does not actually appear in the description of a particular hotel, the clickstream data may indicate that the user selected a particular hotel of New York (New York city) after searching for "hotel in Big Apple". In some embodiments, clickstream data may be generated by interactions between the user computing device 110 and the travel service 150 (e.g., as a result of a search made by the user for the travel service 150). In another embodiment, clickstream data may be inferred or collected during the recommendation of travel services 150 by a third party search engine (not shown), which may inform the travel services of specific user queries utilized by the third party search engine to direct users to the travel services 150. Additionally or alternatively, the search engine 204 may utilize any number of search algorithms known in the art to locate travel items that are potentially relevant to the user's query.
In some cases, the search engine 204 may utilize only a subset of terms from the query that are expected to be relevant to the geographic cluster of potential results. For example, to identify initial results, the search engine 204 may ignore terms within the query that specify a desired amenity. These terms may later be used by the query processing system 154 to rank individual travel items before presenting the travel items to the user, as described below. In some embodiments, search engine 204 may also process or modify the search results. For example, the search engine 204 may refine the search results such that the results include only the top n travel items that are potentially relevant to the user query.
After determining the set of initial search results, the query processing system 154 may transmit the initial search results to the locale identifier 206, which may cluster the results into a set of locales. In one embodiment, clustering may occur based on a single-pass algorithm such that travel items are determined to fall within a previously identified cluster if the location associated with the travel item is within a threshold distance (e.g., 50 miles) of the previously identified cluster, and otherwise are determined to constitute a new cluster. In the event that the travel item is associated with a single geographic location (e.g., a location of a hotel offering accommodation, a location of a destination service offering, a location of a car rental company), the region identifier 206 can use the single geographic location as the location of the travel service. In the case where travel items are associated with multiple geographic locations (e.g., flights having both a take-off location and a return location), the travel items may be considered to be within a threshold distance of a cluster if any of the multiple geographic locations are within the threshold distance from the cluster. The distance to the cluster may be determined based on any location in the cluster. In some embodiments, a travel item may be considered to be within a threshold distance of a pre-existing cluster if its location is within a threshold distance from the location of any other travel item already in the cluster. In other embodiments, travel items may be considered to be within a threshold distance of a pre-existing cluster if the location of the travel item is within a threshold distance from the centroid of the cluster (e.g., determined from the average location of each travel item already within the cluster). Such a single-pass clustering algorithm may be beneficial because it may be quickly utilized by the query processing system 154 to determine clusters related to geographically dispersed items. However, more complex clustering algorithms (e.g., k-means clustering, as known in the art) may also be used by the region identifier 206 to determine clusters of travel items.
After identifying the plurality of clusters, the region identifier 206 may also assign a geographic identifier to each cluster. In one embodiment, the region identifier 206 may use geographic metadata (e.g., travel item data store 160 from FIG. 1) available to the query processing system 154 to identify the most specific geographic identifier applicable to all (or a threshold percentage) of the hotels in the cluster. Such metadata may include an identification of a set of regions, each identifying a geographic region associated with a particular region identifier. In one embodiment, the metadata may be generated based on existing geopolitical boundaries, such as block, city, county, or state boundary lines. In further embodiments, the metadata may be generated manually by travel service 150. In still other embodiments, the travel service 150 can obtain metadata identifying the geographic identifier and the corresponding geographic region or location from a third party service known in the art, such as a Geographic Information Service (GIS). Using such geographic metadata, the region identifier 206 may compare the locations of the travel items in any given cluster to determine identifiers for one or more geographic regions that include the locations of all or a threshold percentage of the travel items in the cluster. Thus, clusters of accommodation of hotels located within the Fremont (Fremont) block of seattle, washington may be identified as "Fremont, seattle, washington clusters, while clusters of accommodation of hotels located at Spokane, washington may be identified as" Spokane, washington clusters. In some embodiments, the travel service 150 may also include an associated geographic identifier in the metadata of the travel item. For example, one or more travel items may be classified as "near Space Needle Tower" (a watchtower in Seattle, washington). Where the clustered travel items are included within multiple regions (e.g., relevant identifiers, blocks, and cities), the travel service 150 may select from the multiple regions based on a priority rating for each region. Illustratively, regions of travel service 150 may be prioritized based on a total area included within the region, based on a popularity of the region on travel service 150 (e.g., based on a frequency with which users search for travel items corresponding to the region), or both. For example, where the travel items of a cluster are located in both the "city center in seattle, washington" region and the "seattle, washington" region, the travel service 150 may use "city center in seattle, washington" as an identifier for the cluster because the region has a smaller size and a greater popularity for the users of the travel service 150. Additionally or alternatively, the travel service 150 may select from a plurality of regions based on a correspondence between terms used in the user query and an identifier for each region. For example, a search for "hotels near space needle tower" may be identified as a cluster of "hotels near space needle tower" rather than "seattle, washington" at least in part because of the overlap of terms between the search query and the region identifier.
Additionally, the region identifier 206 may classify the determined clusters to establish a predicted relevance of each cluster to the search query. In one embodiment, the clusters may be classified based on the travel items determined to fall within the clusters. For example, an initial ranking of each travel item (e.g., as determined by the search engine 204) may be utilized to determine an overall or average ranking of the travel items in each cluster (such that clusters that include travel items that are ranked high by the search engine 204 are arranged above clusters that include travel items that are not ranked high by the search engine 204). In another embodiment, clusters may be sorted based at least in part on correspondence between geographic identifiers assigned to the clusters and queries such that clusters identified as hotels that are geographically "close to space pins" are ranked high for queries that include the term "space pins". In still other embodiments, the clusters may be classified based on the geographic identifier of the cluster itself (e.g., such that geographic identifiers that are more popular with users of the travel service 150 are ranked higher than less popular geographic identifiers, or such that more specific geographic identifiers are ranked higher than less specific geographic identifiers), or based on user preferences for geographic identifiers corresponding to at least one cluster (e.g., as determined by the user's profile data).
Thereafter, the query processing system 154 can communicate the identified clusters to the region classifier 208, which region classifier 208 can attempt to distinguish clusters that are related due to their geographic location from clusters that are related based on the travel items included in the clusters. For example, a query for "Edgewater hotel" may result in two clusters: the first cluster includes the Edgewater hotel in seattle, washington and other nearby hotels, and the second cluster includes hotels within and around the Edgewater area, new jersey. Wherein the first cluster is expected to be related based on a particular travel item (Edgewater hotel) included in the cluster. Thus, the user is not presented with a geographic identifier (e.g., "Hotel in Seattle, washington"), and the cluster may be represented as a particular travel item in the cluster that the user desires to be interested in (e.g., "Edgewater Hotel in Seattle, washington"). User selection of a cluster will then yield detailed information about the particular travel item. Conversely, a second cluster (hotels within and around the Edgewater area, N.J.) may be represented as a user-selectable selection area (e.g., "Edgewater, N.J.) that is selectable to view a list of hotel accommodations within the area.
To distinguish regions that are related by virtue of a particular travel item from regions that are themselves related to each other, the region classifier 208 may utilize a relative ranking of individual travel items in a cluster assigned by the search engine 204 (e.g., based on click stream data, term analysis, or any other search and ranking criteria). In the event that the highest ranked travel item in a given cluster appears to be much more relevant than the second highest ranked travel item, the cluster may be relevant due to the highest ranked travel item. In this way, the cluster may be represented to the user as the travel items themselves only (e.g., as a link to view additional information about a particular travel item). Conversely, where all travel items in a cluster are similarly ranked, the cluster may be presented to the user as a geographic identifier (e.g., a particular city), which may be selected to view travel items within the city. The region classifier 208 may utilize any number of comparisons between items of a given cluster in order to distinguish whether a single travel item of a cluster represents a particular item of interest to a consumer. In the case where a single travel item in a cluster has been assigned a score (e.g., by search engine 204), examples of comparisons between the clustered items include, but are not limited to: a score difference between the first and last ranked travel items in the cluster (e.g., as measured by absolute differences, log domain differences, etc.); a score difference between the first ranked and second ranked travel items in the cluster; a ratio of a difference in score between first and second ranked items in the cluster compared to a difference in score between second and third ranked items in the cluster; a ratio of a difference in scores between the first and second ranked items in the cluster compared to a difference in scores between the second and last ranked items in the cluster. Other factors that the region classifier 208 may use to distinguish cluster types (e.g., region-related clusters or clusters that are related by virtue of a particular travel item) include the number of clusters generated based on the top n search results of the query, the average distance between extracted clusters (e.g., determined based on the center of the clusters, the edges of the clusters, etc.), the maximum distance between all extracted clusters, the minimum distance between all extracted clusters.
Additionally or alternatively, the types of clusters can be distinguished based on a comparison of the query terms entered by the user and terms associated with the highest-ranked travel items in the clusters (e.g., region-related clusters or clusters that are related by virtue of a particular travel item). For example, where the search query and the top ranked travel item share similar terms (e.g., a query for "Edgewater hotel" and a travel item named "Edgewater hotel"), the cluster relevance is likely to be highly relevant to the top ranked travel item. Conversely, where the search query does not share a similar term with a high-ranked travel item (e.g., a query for "hotels in seattle, washington" and a high-ranked travel item "Edgewater hotels"), the cluster may typically be relevant, and the user may benefit by viewing the hotels within the geographic identifier associated with the cluster. In some cases, the similarity of terms between the query and travel items may be quantified as a ratio between matching terms (or "n-grams") between the query and travel item identifiers (e.g., names) and the total terms (or "n-grams") in the query. Thus, in one embodiment, the region classifier 208 may determine that clusters are relevant by virtue of a particular travel item having a proportion of matching terms with the user's query that exceeds a threshold amount. In some embodiments, cluster types may also be distinguished based on a comparison between the query terms and the locale identifier for the cluster. For example, when a name of a region closely matches one or more terms in a query, the cluster can be determined to be relevant by virtue of its corresponding region. Additionally, cluster types may be distinguished based on the particular term or term type used in the query. Illustratively, a region identifier for a cluster may be more likely to be relevant to a query containing a plurality of terms (e.g., "hotels" rather than "one hotel") than a particular travel item identifier. In some embodiments, each of the above metrics may be quantized, and additional metrics may be derived from such quantized comparisons. For example, the query processing system 154 may assign scores to the similarity of the locality identifier of the cluster to the terms used in the query, and scores to the similarity of high search results in the cluster to those terms. Additional metrics may then be created by comparing the two scores. Additional metrics are possible and contemplated within the scope of the present disclosure. The particular metric (or combination thereof) used to distinguish cluster types may in some cases be determined based on using a machine learning algorithm that processes a training data set that includes queries and corresponding result clusters of known relevance types (e.g., relevance based on geographic identifiers or relevance based on high-ranked travel items in the clusters).
The query processing system 154 may then communicate the identified and classified clusters to the results coordinator 210. If multiple clusters have been identified, the results coordinator 210 can send a request (e.g., via the user interface system 156) to the user to select one of the clusters identified by the geographic identifier or the top-ranked travel item in the cluster. If the user selects a cluster (e.g., a hotel) identified by a high-ranked travel item, the user may be immediately directed to a detail page that allows reservations to be made for the hotel. If the user selects a cluster identified by a geographic identifier, the results coordinator 210 can present a list of travel items available in the cluster. In some cases, the results coordinator 210 can reclassify or re-rank the travel items in the cluster. For example, where query terms tagged as amenities (e.g., hydrotherapy centers, fitness centers, etc.) were previously ignored by the search engine 204, the results coordinator may re-evaluate travel items in the cluster based on the availability of those amenities for each travel item. In some cases, the results coordinator may conduct a new search limited to the geographic area identified by the cluster to determine the set of travel items presented to the consumer.
In the event that the query processing system 154 previously identified only a single cluster, the results coordinator 210 may proceed as if the user had selected the single cluster. Thus, if a single cluster is determined to be relevant to a query based on a high ranked travel item, the user may be directed directly to a detailed page for that travel item. If a single cluster is determined to be related based on a geographic region, the results coordinator 210 may present a list of travel items in the cluster (e.g., potentially reclassified or re-ranked based on the initial query), or may present results of a new search for travel items within the geographic region.
In the manner described above, the query processing system 154 may process the unstructured query to determine potential geographic ambiguities within the query and allow the user to select a desired travel item or geographic region. Thus, the query processing system 154 can improve the speed, efficiency, and accuracy of searching for information about travel items over conventional search algorithms.
Referring to FIG. 3, an illustrative example of a user interface 300 that enables disambiguation of search queries utilizing geographic clusters is shown. In this example, the user interface 300 is generated by a browser application (e.g., a "web browser") executing on the user computing device 110 based on instructions (e.g., hypertext markup language [ HTML ] files) received from the travel service 150. The user interface 300 may enable the user computing device 110 to: the method includes submitting a query for a travel service, selecting a resulting suitable cluster if such query is determined to be ambiguous, viewing and retrieving available travel items in response to the submitted travel query, and viewing travel package recommendations generated based at least in part on the submitted travel query. In the example shown in fig. 3. A user identified as "Tom travel" has accessed travel service 150. To enable a user to navigate among the various interfaces of travel service 150, the user interface includes a navigation panel 302 that guides the traveler to various other features provided by travel service 150. Illustratively, the text elements within the navigation panel 302 may correspond to interactive links that, when selected, modify or change the user interface. In the current example, the link 304 "hotel" has been selected by Tom traveler. Based on the selection, the user interface system 156 has returned to the contents of the user interface 300. The user may view an alternative user interface by selecting an alternative connection within the navigation panel 302. For example, a user may view alternative interfaces related to other types of travel items by selecting portions of the navigation panel entitled "packages," cars, "" flights.
Using the user interface 300, the traveler computing device 110 can submit a query for the travel service 150 using the search box 306. In the current example, the user has submitted the unstructured query text "Hotel in Washington". Upon receiving the query, travel service 150 may transmit the query text (along with any other additional information, such as the user's profile information) to query processing system 154. As described above, the query processing system 154 may locate a set of results for the query (e.g., using term matching, click stream classification, or other search algorithm) and process the results to cluster the travel items included in the results geographically.
One example of a potential geographical cluster for the query "hotel in washington" is shown in figure 4. As shown in FIG. 4, potential travel items related to the query "hotels in Washington" may be located throughout the Washington State region 400. It should be noted that only the particular cluster shown in FIG. 4 is actually shown, and that potential travel items related to the query "hotels in Washington" may also include travel items located outside of Washington State region 400 (e.g., in Washington D.C.). To disambiguate the query, the query processing system 154 may use a clustering algorithm (e.g., those described above) to cluster the results into three clusters 402A, 402B, and 402C. Although only three clusters are shown in FIG. 4, any number of clusters may be identified by the query processing system 154. In some cases, the query processing system 154 may "thin" the clusters, such that clusters with low potential relevance (e.g., determined by the total or average score of the travel items in the cluster) are not selected for further processing, thus reducing the number of clusters represented to the user. In the example of FIG. 4, each cluster is associated with a predetermined geographic region of the state of Washington, such that each travel item in the cluster is associated with the geographic region. In particular, the travel items within cluster 402A are shown as being associated with the "Seattle area of Seattle" region; travel items within cluster 402B are shown as being associated with the "Stark (spoke)" region; the travel items within cluster 403C are shown as being associated with the "Yakima Valley" region.
Since most travelers have relatively specific travel locations in mind, it is unlikely that a user will benefit from having hotels from each of the clusters 402A-402C presented at the same time. Additionally, if the user is actually interested in traveling in a less popular or less highly ranked region, it is likely that the initial set of results selected from each of the clusters 402A-402C will not include any travel items from the user's desired region, thus requiring the user to request additional search results (e.g., by moving to subsequent "web page" results), "narrow" their search terms (e.g., by manually specifying the region in which they are interested), or resubmit a different search query. Each of these alternatives is detrimental to the user experience and reduces the speed, accuracy, and efficiency with which travel service 150 can operate.
To address these issues, aspects of the present disclosure present each of the three clusters 402A-402C to the user as a group rather than as individual travel items in the cluster. As described above, the query processing system 154 may be configured to identify the expected relevance of each of the clusters 402A-402C as being relevant by its geographic location or by the highly ranked travel items within the clusters 402A-402C. In the example of fig. 4, assume that the query processing system 154 determines each of the clusters 402A-402C as being related by its geographic location. Thus, to disambiguate the query "Hotel in Washington," the travel service 150 may request that the user select one of the geographic zones represented by the clusters 402A-402C.
An example of a request to select from some identified clusters is shown in fig. 3. In particular, FIG. 3 includes a prompt 308 generated by the travel service 150 to request that the user select from three geographic identifiers 310-314, the geographic identifiers 310-314 identifying the region in which the travel item resulting from the query "Hotel in Washington" is located. Each of the geographic identifiers 310-314 corresponds to one of the clusters 402A-402C shown in fig. 4 (where identifier 310 "hotel in seattle, washington" corresponds to cluster 402A; identifier 312 "hotel in washington stepabam" corresponds to cluster 402B; and identifier 314 "hotel in subunit mare (Yakima)" corresponds to cluster 402C). Through the user interface 300, a user may select one of the geographic identifiers 310 through 314 to receive additional information about the selected identifier. In one embodiment, selection of an identifier may result in display of a collection of travel items included in the cluster represented by the identifier. In further embodiments, selection of an identifier may result in a new query being generated for the benefit of the user (e.g., sharing a previously submitted unstructured query term, but limited to the region identified by the selected geographic identifier). Various systems and methods for displaying travel item query results are known in the art and will not be described in detail herein.
Although each of the clusters 402A-402C is represented by a geographic identifier in fig. 3, in some embodiments, the clusters may instead be represented by (e.g., represented as the name of) a higher-ranked travel item located in the cluster. Also, different types of identifiers of the identified clusters can be intermixed within the same hint. For example, a query for "Edgewater Hotel" may result in a request for information on "Edgewater Hotel in Seattle, washington" (representing a cluster related based on travel items ranked high in the cluster) and "Hotel in Edgewater district, new Jersey" (representing a cluster related based on a geographic region within which travel items in the cluster are located). In the event that the consumer selects an identifier corresponding to a particular travel item (e.g., the Edgewater Hotel in Seattle, wash.), the user may be automatically directed to an information page (which may cause the user to book, reserve, or otherwise acquire the travel item) associated with the particular travel item.
Referring to fig. 5A and 5B, one illustrative routine 500 will be described for disambiguating a search query based on geo-clustering of potential results. For example, the illustrative routine 500 may be performed by the query processing system 154 in conjunction with additional components of the travel service 150. The routine 500 begins at block 502, where the query processing system receives a travel item query. Illustratively, the query may be received by a user using the user computing device 110 of FIG. 1 via a submission to the user interface 300 of FIG. 3. The routine 500 continues at block 504, where the query processing system 154 may pre-process, filter, and/or normalize the query for further processing. Illustratively, block 504 may include preparing the query for further use by the query processing system 154 by normalizing the query by removing capitalization, excess space, or other ignored formats or characters, performing a predefined regular expression on the query, grouping the plurality of words into a single entry (e.g., grouping the words "new york" and "about" into a single entry "new york"), or otherwise performing known processing on the query.
Thereafter, the routine 500 continues at block 506, where the query term may be annotated to identify a "type" of the term, e.g., a type of the sought travel item (e.g., hotel, automobile, flight, activity, etc.), a name or descriptor of the particular item sought (e.g., particular hotel name, hotel brand, etc.), a geographic location of the sought item (e.g., city, county, etc.), a geographic descriptor of the geographic location (e.g., beach, mountain range, etc.), a facility provided by the desired travel item (e.g., swimming pool, hot water bath, dining, fitness center, hydrotherapy center, etc.), a service level descriptor (e.g., luxury, business, economic, etc.), or a particular part of speech (e.g., noun, verb, participle, pronoun, preposition, adverb, conjunctive, etc.). In one embodiment, block 506 may include executing, by the query processing system 154, a natural language processing algorithm that processes the query to annotate individual terms in the query with the determined term types. In some embodiments, before the routine 500 continues, the terms of the query annotated with one or more predefined types (such as prepositions) may be removed from the query. In other embodiments, terms of the query annotated with one or more predefined types may remain within the query, but are ignored in certain portions of the routine 500. For example, the query processing system 154 may ignore terms annotated as "amenities" when determining an initial set of potential results from which to derive a geographic-based cluster.
At block 508, the query processing system 154 implements a search of the set of potential results for the query. In one embodiment, the query processing system 154 may use term matching to determine travel items whose descriptors include terms that are also in the query. In another embodiment, the query processing system 154 may use historical usage data (e.g., as represented by "click stream data" obtained by the travel service 150) of users of the travel service 150. In one embodiment, the query processing system 154 may use naive Bayesian classification to rank or score travel items according to the query. For example, the query processing system 154 may attempt to determine, for any given travel item, a probability that the user selected the travel item in response to the query. The probability may be expressed in the form of P (T | Q), where "T" represents travel items and "Q" represents queries. The probability can be expressed by the following formula:
P(T|Q)=P(Q|T)*P(T)
where P (T) represents the probability of a given travel item being selected and can be calculated as the total number of occurrences of user selection of travel items within the relevant data divided by the total number of queries within the relevant data. Accordingly, the function P (Q | T) can be calculated according to the following equation:
P(Q|T)=P(W 1 |T)*P(W 2 |T)…*P(W n |T)
wherein, W 1…n Represents a series of terms (or "n-grams") in a query, and P (W) n T) represents a given word W n The percentage of times that a single query appears in all queries that result in the selection of travel item T. Thus, by analyzing click stream data to determine a correspondence between past queries and individual travel items, the query processing system 154 may score or rank individual travel items in response to newly submitted queries.
Various additional mechanisms and algorithms for locating travel items in response to queries are known in the art and may additionally or alternatively be used for those described above. In some cases, the query processing system 154 may communicate with different systems in order to determine search results for a query. For example, the query processing system 154 may communicate with a third party search system (e.g., a system provided by the reservation system 120 or other system not shown in the figures) to determine search results for use with the routine 500. Thus, the search mechanisms and algorithms described above are intended to be illustrative in nature and not exhaustive.
Thereafter, at block 512, FIG. 5B continues with the routine 500. In particular, at block 512, the query processing system 154 determines one or more clusters of travel items responsive to the query based on the geographic location of each travel item using the previously established set of search results. Illustratively, the geographic location may be established based on latitude and longitude coordinates of the travel item, an address of the travel item, or other indicator of the travel item's geographic location. The geographic location of the travel items may be predetermined by the travel service 150 or may be determined by the query processing system 154 during implementation of the routine 500 (e.g., usage information extracted from the travel item data store 160, the reservation system 120, or other systems). The query processing system 154 may cluster the travel items based on various types of clustering algorithms. In one embodiment, the request processing system 154 utilizes a one-pass clustering algorithm such that a given travel item is identified as being associated with an existing cluster if the travel item is located within a predetermined distance (e.g., 100 kilometers) from the existing cluster (e.g., represented by a center point of the cluster, a boundary of the cluster, one or more travel items in the cluster), otherwise the given travel item is determined to constitute a new cluster. In another embodiment, the query processing system 154 utilizes a k-means clustering algorithm. Various additional clustering algorithms are known in the art and will therefore not be described in detail herein.
In some embodiments, the query processing system 154 further refines the clusters by removing clusters that are unlikely to be relevant to the user's search. In one embodiment, clusters are refined based on their scores as determined by at least one of an average score or a total score of travel items included in the clusters. Illustratively, during the initial search described above with respect to block 508, a score for a single travel item may be determined. For example, the query processing system 154 may narrow (e.g., ignore) clusters whose scores represent less than a threshold percentage of the total score of all determined clusters. As another example, the query processing system 154 may rank the clusters by scoring and pruning any clusters below a predetermined rank.
Thereafter, the routine continues at block 514, where the query processing system 154 classifies the clusters to determine their desired relevance. In one embodiment, the query processing system 154 may attempt to determine whether a given cluster is likely to be relevant based on a geographic identifier (e.g., a particular neighborhood, city, etc.) attributed to the query, or whether the cluster is likely to be relevant due to a particular travel item included in the cluster. As described above, the query processing system 154 may distinguish the type of cluster based on a number of criteria, including but not limited to: similarity between the terms of the query and the terms of the top-ranked travel items in the cluster; the number of travel items in the top n overall search results included in the query; a score difference between the first ranked and the last ranked travel items in the cluster; a score difference between the first ranked and second ranked travel items in the cluster; a ratio of a difference in score between first and second ranked items in the cluster compared to a difference in score between second and third ranked items in the cluster; a ratio of a difference in scores between the first and second ranked items in the cluster compared to a difference in scores between the second and last ranked items in the cluster. In one embodiment, machine learning techniques may be used to determine specific criteria for distinguishing the types of clusters.
Thereafter, the routine 500 continues at block 516, where the query processing system 154 may present a representation of each identified cluster to the user. One example of a user interface presenting a representation of a cluster is disclosed above with respect to fig. 3. Illustratively, where a cluster is determined to be relevant based on a geographic region corresponding to the cluster, the cluster may be presented to the consumer by a geographic identifier associated with the cluster (e.g., a block, city, or region in the cluster where all or a threshold percentage of the travel items are located). In the event that a cluster is determined to be relevant based on the highest ranked travel item in the cluster, the cluster may be presented to the user as a name or other identifier for the highest ranked travel item.
At block 518, the query processing system 154 receives a user selection of the presented cluster (e.g., as input through a browser application executing on the user computing device 110). Illustratively, selecting a presented cluster may include selecting a hyperlink (e.g., a link in an HTML file) that corresponds to the presented cluster. Thereafter, at block 520, the query processing system 154 displays the results of the selected cluster (e.g., by returning an HTML file or portion of an HTML file corresponding to the selected link). Illustratively, selecting a cluster represented by a geographic identifier (e.g., a neighborhood, city, or region identifier) may result in displaying travel items that match the query submitted by the user and that are located within the geographic identifier. Similarly, selecting a cluster represented by an identifier of an individual travel item (e.g., the name of a travel item ranked high in the cluster) may result in displaying information about the individual travel item and an interface that enables a user to reserve or book the travel item. Thus, by selecting the clusters identified by the query processing system 154, the user may be presented with information regarding the particular, disambiguated portion of the results that were initially identified in response to their potentially ambiguous query. The routine 500 may then end at block 522.
The various illustrative logical blocks, routines, and algorithms described in connection with the embodiments disclosed herein may be implemented as electronic hardware or computer software executing on electronic hardware. To illustrate this, various illustrative components, blocks, modules, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software executed via hardware depends upon the particular application and design constraints imposed on the overall system. The described functionality may be implemented in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method, process, routine, or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may generally correspond to a set of computer-executable instructions for enabling a computing device to perform desired functions. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of non-transitory computer-readable storage medium. An example storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
Conditional language such as "may," "might," "for example," and the like, as used herein, is generally intended to convey that certain embodiments include (and other embodiments do not include) certain features, elements and/or steps, unless expressly stated otherwise or understood otherwise in the context of such usage. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment. The terms "comprising," "including," "having," and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and the like. Furthermore, the term "or" is used in its inclusive sense (and not in the exclusive sense), so that when used, for example, to connect a list of elements, the term "or" means one, some, or all of the elements in the list.
Unless specifically stated otherwise, disjunctive language such as the phrase "at least one of X, Y, or Z" is understood in the context of common usage to mean that an item, term, etc. can be X, Y, or Z or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language generally does not imply and should not imply that certain embodiments require the presence of at least one X, at least one Y, or at least one Z.
While the above detailed description has shown, described, and pointed out novel features as applied to various embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the device or algorithm illustrated may be made without departing from the spirit of the disclosure. It will be recognized that certain embodiments of the present invention described herein may be embodied within a form that does not provide all of the features and benefits set forth herein, as some features may be used or practiced separately from others. The scope of certain inventions disclosed herein is indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (23)

1. A computer-implemented method for disambiguating travel queries, the computer-implemented method comprising:
implemented by one or more computing devices configured with certain executable instructions,
receiving a travel item query from a user computing device associated with a user;
identifying a plurality of travel items in response to the travel item query;
determining a geographic location for each of the plurality of travel items;
clustering the plurality of travel items according to their respective geographic locations to identify at least two clusters of travel items, each of the at least two clusters of travel items comprising a set of travel items from the plurality of travel items, wherein clustering the plurality of travel items comprises:
identifying a physical distance between the geographic locations of respective ones of a first subset of the travel items within the plurality of travel items to identify a first cluster of travel items that identifies the respective ones of the first subset of travel items as being within a first distance of each other; and
identifying a physical distance between the geographic locations of respective ones of a second subset of the travel items within the plurality of travel items to identify a second cluster of travel items that identifies the respective ones of the second subset of travel items as being within a second distance of each other;
determining an expected relevance of each of the at least two travel item clusters based at least in part on the geographic region to which each travel item cluster corresponds;
sending the geographic identifier for each of the at least two travel item clusters to the user computing device for presentation to the user;
receiving, from the user computing device, a selection of one of the at least two travel item clusters; and
sending information relating to a set of travel items included in the selected cluster of travel items to the user computing device for presentation to the user.
2. The computer-implemented method of claim 1, wherein the travel items include at least one of accommodation, ground transportation, activities, tours, day trips, or destination services.
3. The computer-implemented method of claim 1, wherein the travel item query is a free-form text query.
4. The computer-implemented method of claim 1, further comprising:
identifying one or more terms in the travel item query that correspond to a predetermined part of speech; and
removing the one or more terms from the travel item query.
5. A computer-implemented system for disambiguating travel queries, comprising:
a data store comprising geographic locations for a plurality of travel items;
a computing device in communication with the data store, the computing device configured with computer-executable instructions that, when executed, cause the computing device to at least:
receiving a query from another computing device;
identifying the plurality of travel items in the data store in response to the query;
clustering the plurality of travel items according to geographic locations for each of the plurality of travel items to identify at least two clusters of travel items, each of the at least two clusters of travel items comprising a set of travel items from the plurality of travel items, wherein clustering the plurality of travel items comprises:
identifying a physical distance between the geographic locations of respective ones of a first subset of the travel items within the plurality of travel items to identify a first cluster of travel items that identifies the respective ones of the first subset of travel items as being within a first distance of each other; and
identifying a physical distance between the geographic locations of respective ones of a second subset of the travel items within the plurality of travel items to identify a second cluster of travel items that identifies the respective ones of the second subset of travel items as being within a second distance of each other;
transmitting an identifier for each of the at least two travel item clusters to the other computing device;
receiving, from the other computing device, a selection of one of the at least two travel item clusters; and
sending information related to a set of travel items included in the selected cluster of travel items to the other computing device.
6. The system of claim 5, wherein the computer-executable instructions further cause the computing device to at least determine the expected relevance for a first one of the at least two clusters of travel items based at least in part on a geographic region corresponding to the first cluster of travel items, and wherein the identifier for the first cluster of travel items is an identifier of the geographic region.
7. The system of claim 6, wherein the computer-executable instructions further cause the computing device to identify at least the geographic region to which the first cluster of travel items corresponds based at least in part on a geographic location of each travel item in the first cluster of travel items.
8. The system of claim 6, wherein the computer-executable instructions further cause the computing device to identify at least the geographic zone to which the first travel item cluster corresponds from a plurality of predetermined geographic zones identified in the data store.
9. The system of claim 6, wherein the computer-executable instructions cause the computing device to at least identify the plurality of travel items in the data store that are responsive to the query based at least in part on processing of one or more terms of the query according to a naive Bayesian classifier.
10. The system of claim 9, wherein the naive bayes classifier is based at least in part on clickstream data that associates one or more terms of the query to the plurality of travel items.
11. A non-transitory computer-readable medium comprising computer-readable instructions that, when executed by a computing system comprising a processor, cause the computing system to at least:
receiving a query from a computing device;
identifying a plurality of travel items in response to the query, wherein each travel item of the plurality of travel items is associated with a geographic location;
clustering the plurality of travel items according to their respective geographic locations to identify at least two travel items, each of the at least two travel item clusters comprising a set of travel items from the plurality of travel items, wherein clustering the plurality of travel items comprises:
identifying a physical distance between the geographic locations of respective ones of a first subset of the travel items within the plurality of travel items to identify a first cluster of travel items that identifies the respective ones of the first subset of travel items as being within a first distance of each other; and
identifying a physical distance between the geographic locations of respective ones of a second subset of the plurality of travel items to identify a second cluster of travel items that identifies the respective travel items of the second subset of travel items as being within the second distance from each other;
sending an identifier for each of the at least two travel item clusters to the computing device;
receiving, from the computing device, a selection of one of the at least two travel item clusters; and
sending information related to a set of travel items included in the cluster of travel items to the computing device.
12. The non-transitory computer-readable medium of claim 11, wherein the geographic location comprises at least one of geographic coordinates corresponding to a travel item or address information for a travel item.
13. The non-transitory computer-readable medium of claim 11, wherein the computer-executable instructions further cause the computing device to determine an expected relevance for a first travel item cluster of the at least two travel item clusters based at least in part on a single travel item of the first travel item cluster, and wherein the identifier for the first travel item cluster is an identifier of the single travel item.
14. The non-transitory computer-readable medium of claim 11, wherein the computer-executable instructions cause the computing device to cluster the plurality of travel items according to their geographic locations to identify the at least two travel item clusters at least in part by:
for each travel item of the plurality of travel items:
determining whether the location of the travel item is within a threshold distance from any previously established clusters;
if the travel item is determined to be within a threshold distance from a previously established cluster, including the travel item in the previously established cluster; and
if the travel item is determined not to be within a threshold distance from any previously established clusters, a new cluster is established that includes the travel item.
15. The non-transitory computer-readable medium of claim 11, wherein the computer-executable instructions further cause the computing device to rank at least a set of travel items included in the selected travel item cluster, and wherein representations of sets of travel items included in the selected travel item cluster represent sets of travel items according to their respective rankings.
16. The non-transitory computer-readable medium of claim 15, wherein the computer-executable instructions further cause the computing device to at least:
identifying one or more terms in the travel item that correspond to a desired amenity; and
ranking the set of travel items included in the selected travel item cluster based at least in part on the desired amenities.
17. A computer-implemented method, comprising:
under the control of a processor executing specific computer-executable instructions,
receiving a query from a computing device;
identifying a plurality of travel items in response to the query, wherein each travel item of the plurality of travel items is associated with a geographic location;
clustering the plurality of travel items according to their respective geographic locations to identify at least two travel items, each of the at least two travel item clusters comprising a set of travel items from the plurality of travel items, wherein clustering the plurality of travel items comprises:
identifying a physical distance between the geographic locations of respective ones of a first subset of the travel items within the plurality of travel items to identify a first cluster of travel items that identifies the respective ones of the first subset of travel items as being within a first distance of each other; and
identifying a physical distance between the geographic locations of respective ones of a second subset of the travel items within the plurality of travel items to identify a second cluster of travel items that identifies the respective ones of the second subset of travel items as being within a second distance of each other;
determining an identifier for each of the at least two travel item clusters;
identifying a travel item cluster of the at least two travel item clusters based at least in part on the determined identifiers for each of the at least two travel item clusters; and
information is sent to the computing device regarding a set of travel items included in the selected cluster of travel items.
18. The computer-implemented method of claim 17, further comprising:
identifying one or more terms in the travel item query that correspond to the geographic location;
wherein identifying a plurality of travel items in response to the travel item query comprises identifying a plurality of travel items corresponding to the geographic location.
19. The computer-implemented method of claim 17, further comprising:
identifying one or more terms in the travel item query that correspond to a desired amenity; and
wherein identifying a plurality of travel items in response to the travel item query comprises ignoring one or more terms in the travel item query that correspond to a desired amenity.
20. The computer-implemented method of claim 19, wherein identifying one or more terms in the travel item query that correspond to the desired amenity comprises applying natural language processing to the travel item query.
21. The computer-implemented method of claim 17, further comprising determining an expected relevance for a first travel item cluster of the at least two travel item clusters based at least in part on a single travel item of the first travel item cluster, and wherein the identifier for the first travel item cluster is an identifier of the single travel item.
22. The computer-implemented method of claim 17, wherein identifying a travel item cluster of the at least two travel item clusters based at least in part on the determined identifier of each travel item cluster of the at least two travel item clusters comprises:
transmitting, to the computing device, an identifier of each of the at least two travel item clusters; and is
A selection of the cluster of travel items is received from the computing device.
23. The computer-implemented method of claim 17, wherein identifying a travel item cluster of the at least two travel item clusters based at least in part on the determined identifier of each travel item cluster of the at least two travel item clusters comprises:
determining at least one preferred geographic region of a user associated with the computing device based at least in part on a history of travel items acquired by the user; and is
A location of a cluster of travel items corresponding to the preferred geographic region is determined.
HK18107474.0A 2015-07-28 2016-07-13 Computer-implemented method and system for disambiguating travel queries HK1248329B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14/811,446 2015-07-28
US14/811,446 US10360276B2 (en) 2015-07-28 2015-07-28 Disambiguating search queries
PCT/US2016/042145 WO2017019304A1 (en) 2015-07-28 2016-07-13 Disambiguating search queries

Publications (2)

Publication Number Publication Date
HK1248329A1 HK1248329A1 (en) 2018-10-12
HK1248329B true HK1248329B (en) 2023-04-14

Family

ID=

Similar Documents

Publication Publication Date Title
US11436294B2 (en) Disambiguating search queries
US9411890B2 (en) Graph-based search queries using web content metadata
US9336277B2 (en) Query suggestions based on search data
US10235470B2 (en) User retrieval enhancement
US12124457B2 (en) Triggering local extensions based on inferred intent
US20130212089A1 (en) Search Result Categorization
US20180032523A1 (en) Computer application query suggestions
WO2014210193A2 (en) Providing information to a user based on determined user activity
JP2012501499A (en) System and method for supporting search request by vertical proposal
WO2018097872A1 (en) Animated snippets for search results
EP3491542A1 (en) Platform support clusters from computer application metadata
US20140156623A1 (en) Generating and displaying tasks
EP3485394B1 (en) Contextual based image search results
US20190065502A1 (en) Providing information related to a table of a document in response to a search query
US10339148B2 (en) Cross-platform computer application query categories
US8892597B1 (en) Selecting data collections to search based on the query
HK1248329B (en) Computer-implemented method and system for disambiguating travel queries
US10902444B2 (en) Computer application market clusters for application searching
US9858291B1 (en) Detection of related local entities