WO2016069767A1 - Classifier recall estimation for sparse topics - Google Patents
Classifier recall estimation for sparse topics Download PDFInfo
- Publication number
- WO2016069767A1 WO2016069767A1 PCT/US2015/057851 US2015057851W WO2016069767A1 WO 2016069767 A1 WO2016069767 A1 WO 2016069767A1 US 2015057851 W US2015057851 W US 2015057851W WO 2016069767 A1 WO2016069767 A1 WO 2016069767A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- documents
- classifier
- document
- users
- constraints
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0255—Targeted advertisements based on user history
Definitions
- the instant specification relates to digital information retrieval in the context of social platforms.
- search engines In the field of information retrieval, search engines typically implement indexing and retrieval, responsive to search queries, of networked accessible documents.
- documents are units of content and can be, for example, Web pages, text files, Portable Document Format files, video files, audio files, image files, and social messages of a social platform.
- different types of documents will exhibit different features, which in general are programmatically detectable properties of a document that are computationally quantifiable.
- a particular text file with its particular string of text (often parsable into meaningful terms), for example, will exhibit features commonly referred to as term frequency, document length, and inverse term frequency.
- a particular photo with its particular captured image will exhibit features relating to color, opacity, and texture.
- documents sharing common features are indexed into a same corpus, which facilitates programmatic comparisons of one document with another (i.e., it's easier for a computing resource to compare apples to apples).
- a document may describe more than one topic. For example, a photo of a celebrity in front of a famous landmark can be associated with both the celebrity and the landmark.
- Classification is typically implemented with some combination of programmatic resources and human judgment, especially when the former are leveraging machine learning, which requires training in order to be effective.
- the programmatic resource takes as input selected features of a document and apply machine learning to determine whether that document can be classified as one describing the topic for which the classifier is assigned to detect.
- a sampling of the classifier's decisions are evaluated by human operators, which evaluations are applied to assess the effectiveness of the classifier and to calibrate its machine-learning parameters.
- One measure of effectiveness of a classifier is recall, which, for the implementations in which the classifier is tasked with assigning topics to documents, is its ability to classify properly all of documents of a corpus that describes the topic it is assigned to detect.
- a second classifier (C 2 ) is constructed to be sufficiently independent from the classifier being evaluated (C , where both classifiers are constructed to detect a same sparse topic of a particular corpus of documents (U).
- independence generally means having different features, those that are not interwoven.
- perfect independence means orthogonal features in feature space, and perfect dependence means having features on a same line in feature space.
- the second classifier C 2 is then used as a proxy for the corpus U in the evaluation of C 2 's ability to recall the sparse topic, and instead of having to sample from the entire corpus U to evaluate C 2 's ability to recall, one has to sample only from the set of documents selected by C 2 , which is a significantly smaller subset than U, and hence the sample size needed is significantly reduced.
- perfect independence is not implemented as imperfect independence can yield acceptable accuracy in estimating Q's recall. That is, C 2 can optionally be constructed with features that are not independent but only distantly related to features used by Q.
- one or more embodiments relate to a method for document retrieval.
- the method includes classifying, by a first classifier, a documents to obtain a first set of resulting documents, and classifying, by a second classifier, the documents to obtain a second set of resulting documents.
- the method further includes estimating a recall metric of the first classifier based on the first set of resulting documents and the second set of resulting documents, revising the first classifier based on the recall metric, and classifying, by the revised classifier, the documents to obtain at least one additional document excluded from the first set of documents.
- the method further includes sending a communication based on the classification.
- one or more embodiments relate to a system for recall estimation that includes a computing processor, and instructions, that when executed on the computing device processor classify, by a first classifier, a documents to obtain a first set of resulting documents, and classify, by a second classifier, the documents to obtain a second set of resulting documents.
- the instructions further includes estimate a recall metric of the first classifier based on the first set of resulting documents and the second set of resulting documents, revise the first classifier based on the recall metric, and classify, by the revised classifier, the documents to obtain at least one additional document excluded from the first set of documents.
- the instructions further includes send a communication based on the classification.
- one or more embodiments relate to a non- transitory computer readable medium comprising computer readable program code that classify, by a first classifier, a documents to obtain a first set of resulting documents, and classify, by a second classifier, the documents to obtain a second set of resulting documents.
- the computer readable program code further includes estimate a recall metric of the first classifier based on the first set of resulting documents and the second set of resulting documents, revise the first classifier based on the recall metric, and classify, by the revised classifier, the documents to obtain at least one additional document excluded from the first set of documents.
- the computer readable program code further includes send a communication based on the classification.
- Some implementations of the invention require a significantly smaller sample size than conventional systems to evaluate effectively a classifier's ability to recall sparse topics. Because sampling requires human judgment, the significantly smaller sample size advantageously imposes a significantly reduced time and cost burden that are not prohibitive and hence allows one to perform the described evaluation even for a corpus that has billions of documents.
- FIG. 1A, IB, and 2 show diagrams of a system in accordance with one or more embodiments of the invention.
- FIGs. 3 and 4 show flowcharts in accordance with one or more embodiments of the invention.
- FIG. 5 shows an example in accordance with one or more embodiments of the invention.
- FIG. 6 shows a computing system in accordance with one or more embodiments of the invention.
- ordinal numbers e.g., first, second, third, etc.
- an element i.e., any noun in the application. Lhe use of ordinal numbers is not to imply or create any particular ordering of the elements nor to limit any element to being only a single element unless expressly disclosed, such as by the use of the terms "before”, “after”, “single”, and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements.
- a first element is distinct from a second element, and the first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.
- embodiments of the invention are directed to document retrieval.
- One or more embodiments have at least two different classifiers that independently classify multiple documents to obtain at least two sets of resulting documents.
- the classification from at least one classifier is used to generate a recall metric for another classifier.
- the recall metric is an estimation of a number of documents that belong to the class but are not classified in the class by the classifier.
- the other classifier may be revised based on the estimated amount.
- the classifiers are applied to sparse document classification in a large corpus.
- the large corpus may have millions or billions of documents and the number of documents that actually belong to the class may be a tenth of a percent or less of the total corpus.
- One or more embodiments may use the results of one classifier as a proxy for the corpus of documents in order to generate an estimate of recall for another classifier.
- the results of the second classifier acts as proxy for the entire corpus of documents.
- the classifiers satisfy a threshold degree of independence with respect to each other. Independence is discussed below with respect to FIG. 2.
- One or more embodiments of the invention may be applied to a social network.
- social content is segregated into corpus where those that share common features or attributes are assigned to a same corpus.
- a document comprises all social content that is attributable to a particular user, and such a document is associated with the user.
- the corpus hence can consists of multiple documents (hundreds of millions or even billions), each being associated with a particular user.
- an Ads system can infer topics of interest to a user by detecting topics discussed in the user's document and then provide targeted advertisement, the inference being that a user is interested in the topics discussed in her document of social content.
- At least two classifiers for sparse topics can be evaluated and calibrated to recall effectively documents attributable to the sparse topics.
- one classifier would be over "documents" made up of the social media messages from the account, and the second classifier may be using the users followed by the account/bio text/other signals.
- Ads targeting can be highly granular (i.e., specific to the sparse topic), despite the corpus of documents of social content being very large.
- sparse topics can be, for example, a particular model of car with particular options, which provides more precise Ads targeting than just the car model.
- users sharing similar topics of interest can be grouped together in a same group, and similar curated content can be provided to the users in the group.
- Curated content relating to news about foreign policy and sports for example, can be provided to users in a group associated with foreign policy and sports.
- Curated content relating to news about sports and food can be provided to users of another group, one associated with sports and food.
- user grouping can be implemented to facilitate programmatic curation.
- curated content is modular so that each module covers one topic that may be of interest to users
- a programmatic curation system can build curated content for each group of users of a social platform by selecting modules of content that corresponds to the topics of interest associated with the group. Because programmatic curation can be implemented with low latency, the content provided can be fresh content, even breaking news.
- FIG. 1A shows a schematic diagram of a system for a social network in accordance with one or more embodiments of the invention.
- FIG. 1A shows a social network application (100) and a client (120) in accordance with one or more embodiments of the invention.
- the social network application (100) may also be referred to as a messaging platform in accordance with one or more embodiments of the invention.
- a social network application (100) connects users to other users (i.e., clients) of the social network application (100), exchanges social networking messages between connected users of the social network application (100), and provides an interface for a user to create and view social network messages.
- social network messages are broadcast social networking messages that are transmitted to at least a set of users.
- the users in the set may be self-selected (e.g., followers of the transmitting user) or users that satisfy a certain status with the transmitting user (e.g., belong to a group, friend, family, etc.).
- the social networking messages may include, for example, a comment from a user on a document, a personal status update, a reference to a document, and/or other information or combination thereof.
- the social network application (100) includes functionality to receive an original reference from a user for a document, generate a social network reference from the original reference, and transmit the social network reference to other users.
- a user may share the document via the social network application (100) by sending a message containing a reference to the document to other users or posting a social network reference to the document.
- the original reference is a reference to the location of the published document, such as a uniform resource locator (URL) of a web page.
- the social network reference is an indirect reference to the location of the published document.
- the social network reference is mapped to the original reference in the discovery repository (148).
- the social network application may be configured to perform analytics on the engagement of the social network reference and/or shorten the original reference.
- the social network reference and the original reference may be a hypertext transfer protocol link or another mechanism for referencing the location of a document.
- the social network application (100) has multiple components including a classification engine (150), a machine learning module (not shown), a discovery repository (148), a frontend module (125), a routing module (155), a graph fanout module (130), a delivery module (135), a message repository (140), a connection graph repository (142), a stream repository (144), and an account repository (146).
- the social network application (100) may be located on the same device (e.g., a server, mainframe, desktop Personal Computer (PC), laptop, Personal Digital Assistant (PDA), telephone, mobile phone, kiosk, cable box, and any other device) or may be located on separate devices connected by a network (e.g., a local area network (LAN), the Internet, etc.). More than one of each separate component running on a device, as well as any combination of these components, may exist within a given embodiment of the invention.
- the social network application (100) is a platform for facilitating real-time communication between one or more entities.
- the social network application (100) may store millions of accounts of individuals, businesses, and/or other entities (e.g., pseudonym accounts, novelty accounts, etc.). One or more users of each account may use the social network application (100) to send social networking messages to other accounts inside and/or outside of the social network application (100).
- the social network application (100) may be configured to enable users to communicate in "real-time", i.e., to converse with other users with a minimal delay and to conduct a conversation with one or more other users during simultaneous sessions.
- the social network application (100) may allow a user to broadcast social networking messages and may display the social networking messages to one or more other users within a reasonable time frame so as to facilitate a live conversation between the users.
- Recipients of a social networking message may have a predefined graph relationship with an account of the user broadcasting the social networking message.
- the user is not an account holder or is not logged in to an account of the social network application (100).
- the social network application (100) may be configured to allow the user to broadcast social networking messages and/or to utilize other functionality of the social network application (100) by associating the user with a temporary account or identifier.
- connection graph repository (142) is configured to store one or more connection graphs.
- FIG. IB shows an example depiction of a connection graph (299) in accordance with one or more embodiments of the invention.
- the connection graph (299) has multiple components, including nodes representing accounts of the social network application (100) (i.e., Account A (200), Account B (202), Account C (204), Account D (206), Account E (208), Account F (210), Account G (212)) and edges connecting the various nodes.
- connection graph (299) is a data structure representing relationships
- connection graph (299) represents accounts as nodes and relationships as edges connecting one or more nodes.
- a relationship may refer to any association between the accounts (e.g., following, friending, subscribing, tracking, liking, tagging, etc.).
- the edges of the connection graph (299) may be directed and/or undirected based on the type of relationship (e.g., bidirectional, unidirectional), in accordance with various embodiments of the invention.
- the routing module (155) includes functionality to receive one or more social networking messages and to store the social networking messages in the message repository (140).
- the routing module (155) may include functionality to assign an identifier to the social networking message and to notify the graph fanout module (130) of a sender of the social networking message.
- the graph fanout module (130) includes functionality to retrieve graph data from the connection graph repository (142) and to use the graph data to determine which accounts in the social network application (100) should receive the social networking message.
- the graph data may reflect which accounts in the social network application are "following" a particular account and are, therefore, subscribed to receive status social networking messages from the particular account.
- the delivery module (135) includes functionality to receive a list of accounts from the graph fanout module (130) and the message identifier generated by the routing module (155) and to insert the message identifier into stream data associated with each identified account.
- the delivery module (135) may then store the message list in the stream repository (144).
- the stream data stored in the stream repository (144) may make up one or more streams associated with one or more accounts of the social network application (100).
- a stream may be a dynamic list of social networking messages associated with one or more accounts or may reflect any arbitrary organization of social networking messages that is advantageous for the user of an account.
- the frontend module (125) is a software application or a set of related software applications configured to communicate with external entities (e.g., client (120)).
- the frontend module (125) may include the application programming interface (API) and/or any number of other components used for communicating with entities outside of the social network application (100).
- the API may include any number of specifications for making requests from and/or providing data to the social network application (100).
- a function provided by the API may provide artist/song recommendations to a requesting client (not shown).
- the frontend module (125) is configured to use one or more of the data repositories (message repository (140), connection graph repository (142), stream repository (144), discovery repository (148), and/or account repository (146)) to define streams for serving social networking messages (i.e., stream data) to a user of the account on the social network application (100).
- a user may use any client (120) to receive the social networking messages.
- an API of the frontend module (125) may be utilized to define one or more streams and/or to serve the stream data to the client for presentation to the user.
- different forms of message delivery may be handled by different modules in the frontend module (125).
- the user may specify particular receipt preferences, which are implemented by the frontend module (125).
- one or more of the data repositories is a database and/or storage service residing on one or more servers.
- one or more of the data repositories may be implemented as a storage service using service-oriented architecture (SOA) and configured to receive requests for data and to provide requested data to other components of the social network application (100).
- SOA service-oriented architecture
- the message repository (140) may include one or more tables in a distributed database management system (DBMS), a clustered database, a standalone flat file, and/or any storage software residing on one or more physical storage devices. Examples of a storage device may include, but are not limited to, a hard disk drive, a solid state drive, and/or other memory device. Any type of database or storage application may be used, in accordance with various embodiments of the invention.
- one or more of the data repositories is a separate application or set of applications residing on one or more servers external (and communicatively coupled) to the social network application (100).
- one or more of the data repositories may be an integrated component of the social network application (100) and/or may reside, either partially or entirely, on one or more common hardware devices (e.g., a server).
- the discovery repository (148) may store document metadata about documents.
- the document metadata may include a list of keywords, tracking information, date of the document, a mapping between social media references and original references to documents, and other information.
- the message repository stores
- the social networking messages metadata may include an identifier of the originating user of the social networking message, a list of users that received the social networking message, a number of users that received the social networking message, statistics (e.g., a ratio of connected users to the originating user that forward the social networking message versus disconnected users to the originating user that forward the social networking message), time and date at which the social networking message is transmitted, and other information.
- connection graph repository may store information about users.
- the connection graph repository may relate user identifiers of a user to user's preferences and history in the social network application (100).
- the user preferences and history may include language, connections of the user, topics in which the user is interested and other information.
- the classification engine (150) includes functionality to classify documents and determine metrics from classification in accordance with one or more embodiments of the invention.
- the classification engine (150) is described in further detail in FIG. 2.
- FIG. 2 shows a schematic diagram of a system in accordance with one or more embodiments of the invention.
- the system may include an internet repository (220), a social network repository (208), the classification engine (200), and a communication disperser (218) in accordance with one or more embodiments of the invention.
- An Internet repository (220) is any data repository available via the Internet.
- the Internet repository (220) is any storage unit of Internet documents (222).
- the Internet repository (220) may be a collection of one or more personal computers, servers, mobile devices, or any other computing system or collection thereof that is connected via the Internet.
- the Internet repository (220) may be controlled by multiple disparate companies and users.
- the Internet repository (220) includes functionality to store Internet documents (222).
- Internet documents (222) are documents that are available via the Internet.
- a document is any type of stored information.
- a document may be a technical paper, an article, an audio file, a webpage, a file, a tweet, a posting, a user profile, a message posted or communicated via a social media platform, a user's digital footprint or any other stored information.
- the user's digital footprint may include the collection of documents that the user has accessed, submitted social media messages about, authored, or with which the user has another direct relationship.
- the document may have document content and metadata.
- Document content is the content of the document (e.g., the main body of the document) that is presented to users when the document is displayed or played to the user.
- Metadata is information about the document.
- the metadata may include the name and size of the document, the author, edit history (e.g., including a user identifier, a time, and tracking information about the edits made for each edit in the edit history), creation time of the document, security features enabled on the document, and other information.
- the security features may include password protection, public key encryption, or other security features, or a combination of features.
- a document may have one or more attributes.
- An attribute is a feature of a portion of the document content or metadata.
- the attribute may be information explicitly present in the document content or metadata or may be information derived from the document content and/or metadata.
- an attribute may be a name, a title, a table of contents in the document content, an education level identifier of the targeted reader, a number of retransmissions of the document, size of the document, a uniform resource identifier, or other information.
- the social network repository (208) is any type of storage unit and/or device (e.g., a file system, database, collection of tables, or any other storage mechanism) for storing data. Further, the social network repository (208) may include multiple different storage units and/or devices. The multiple different storage units and/or devices may or may not be of the same type or located at the same physical site. The social network repository (208) may correspond to any of the repositories or any combinations of repositories shown in FIG. 1A.
- the social network repository (208) may store profiles (210) for each user, social network documents (212), and classifier results (e.g., classifier X results (214), classifier Y results (216)).
- the profiles (210) correspond to information about a user.
- a profile may include a unique identifier of the user, a name, message preferences for the user, public and private identifier, location information, photo, links to messages, connections to other users and groups, and other profile information.
- Social network documents (212) correspond to any document for a social network.
- the social network documents (212) may include a posting by a user, a technical paper that is stored in the social network repository, a user profile, a status update, or any other document that is local in the social network repository.
- the classifier results correspond to results of the classification.
- the classifier results may include, for each document classified by a classifier, a unique document identifier of the document and a class of the document.
- classifier results may be stored with the document or separate from the document.
- the classifier results may be a relationship between the document and the class of the document.
- the classifier results may be implemented as tags included in an index of documents in the corpus.
- the system may include a classification engine
- a classifier is hardware, software, firmware, or any combination thereof that includes functionality to classify documents (e.g., social network documents (212) and/or Internet documents (222)).
- the classifiers may be a two-way classifier that classifies documents into one of two groups, or a multi-way classifier that classifies documents into one of multiple groups.
- a two-way classier is a decision rule which, given a document, assigns the document a label from the set ⁇ +1,-1 ⁇ .
- a +1 label is given to "positive” decisions where the classification decision indicates a belief that the document in question satisfies the search criteria, while a -1 label is given to "negative" decisions.
- the classifiers make decisions using the attributes of the input document, such as the text in the document, authorship information or connections with other documents. In other words, the classifier performs the classifications based on the attributes having one or more attributes of the document. In one or more embodiments of the invention, the particular attributes used by the classifier is configurable and may be automatically updated.
- the classifiers satisfy a threshold degree of independence with respect to each other. In one or more embodiments of the invention, the amount of in classifiers are independent when the classifiers satisfy the following.
- ⁇ i is the degree of independence between classifier Ci and C 2 as defined by the above equation.
- classifier C and C 2 satisfy the threshold degree of independence when 5 t is greater than the threshold degree of independence.
- the threshold degree of independence may be .95, .99, .90 or any other configurable value.
- the above equation may be deemed valid where the classifiers use different and non-overlapping attributes of the documents.
- Another way to characterize the degree of independence between classifiers is based on the documents that are not in the class (i.e., off-topic documents, T c ).
- the following equation may be used to determine the degree of independence based on the number of documents that are not in the class.
- the analyzer (206) includes functionality to analyze classifier results
- Recall estimation is the propensity of a classifier to correctly not miss a particular document in the classification.
- precision is a measurement of the percentage of documents that are added, by the classifier, to a class and should be in the class (e.g., the number of documents that are correctly in the class by the classifier).
- Recall estimation is an estimation of the percentage of documents that are not in the class as classified by the classifier, but should be in the class. For example, in a classification in which a document is either in the class or not in the class, recall estimation is a percentage or other number of documents that are missed being added to the class by the classifier. Recall may be defined using the following equation:
- relevant documents are all documents that should be classified into the class, and retrieved documents are documents that are classified into the class.
- precision is the number of documents that are correctly retrieved by a classifier.
- precision may be defined using the following equation:
- relevant documents are all documents that should be classified into the class, and retrieved documents are documents that are classified into the class.
- the analyzer (206) includes functionality to execute multiple classifiers (classifier X (202), classifier Y (204)) and perform recall estimation based on the classification results.
- the analyzer (206) may further include functionality to revise at least one of the classifiers.
- the communication disperser (218) is hardware, software, firmware, or any combination thereof that includes functionality to disburse communications based on the classifier results.
- the communication disperser (218) may be a part of the social network application and/or an independent product, such as an application used by a third party.
- the communication disperser may be an advertising server that disperses communications corresponding to advertisements, a recommendation engine that includes functionality to transmit recommendations to a user, or any other disperser of communications.
- the recommendations may be recommendations of contacts to include in a social network, recommendations of documents, recommendations of profile preferences, or any other recommendation.
- FIG. 2 shows two classifiers, additional classifiers may be included without departing from the scope of the invention. Further, although FIG. 2 shows a communication disperser connected to the social network repository, any other module that uses classification results may be used without departing from the scope of the invention.
- FIGs. 1A, IB, and 2 show a configuration of components, other configurations may be used without departing from the scope of the invention.
- various components may be combined to create a single component.
- the functionality performed by a single component may be performed by two or more components.
- FIGs. 3 and 4 show a flowchart in accordance with one or more embodiments of the invention. While the various steps in this flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the steps may be executed in different orders, may be combined or omitted, and some or all of the steps may be executed in parallel. Furthermore, the steps may be performed actively or passively. For example, some steps may be performed using polling or be interrupt driven in accordance with one or more embodiments of the invention. By way of an example, determination steps may not require a processor to process an instruction unless an interrupt is received to signify that a condition exists in accordance with one or more embodiments of the invention. As another example, determination steps may be performed by performing a test, such as checking a data value to test whether the value is consistent with the tested condition, in accordance with one or more embodiments of the invention.
- a test such as checking a data value to test whether the value is consistent with the tested condition, in accordance with one or more embodiments of the invention.
- FIG. 3 shows a flowchart for classifying documents associated with a user in accordance with one or more embodiments of the invention.
- the documents may be a digital footprint or a collection of documents related to a user.
- Step 301 the documents are classified to obtain a first set of resulting documents in a class in accordance with one or more embodiments of the invention.
- the classifier identifies a document and extracts the attribute values of the document corresponding to the attributes in the attribute set of the classifier. By comparing the attribute values to the attributes of the classes, by which the classifier is trained, the classifier determines to which class the document should be added.
- the classifier repeats the process for each document.
- the classifier operates as a single thread serially. In other embodiments, the classifier may have multiple threads or multiple application instances that classify documents in parallel.
- each document may have one or more users associated with the document.
- a user may have sent a message having a link to the document, may be an author of the document, may have accessed the document, or may have performed another action with respect to the document.
- profile information about the user and/or metadata information for the document By accessing profile information about the user and/or metadata information for the document, one or more users may be identified that have a relationship to the document.
- one or more pre-defined relationship criteria may exist for the user's relationship to the document. For example, rather than any user having a relationship to the document, the pre-defined relationship criteria may specify a pre-defined relationship that the user has in order to be identified in Step 303.
- Step 305 communications are sent to each of the users corresponding to at least one document that is classified in accordance with one or more embodiments of the invention.
- the communication is sent based on the classification.
- the communication is sent to each user that has the one or more pre-defined relationships with the document in the class in response to the user having the pre-defined relationship.
- Communications may be sent via a user interface of an executing application, via a social media message, via electronic mail, or via any other technique.
- Step 307 documents are classified using a second classifier to obtain a second set of resulting documents that are in the class in accordance with one or more embodiments of the invention.
- the second classifier is selected based on a degree of independence to the first classifier.
- documents tend to be redundant. Namely, the same information is expressed in many different ways throughout the document.
- the second classifier may be trained to classify documents based on a different and non-overlapping set of attributes than the first classifier, based on attribute values of the attributes, based on different sections of content in a document, or based on another characteristic.
- the independence between the first classifier and the second classifier may not be perfect. For example, some dependency may exist between the attributes of the classifier. However, in accordance with one or more embodiments of the invention, the amount of independence between the classifiers satisfies a threshold. Constructing example sets of classifiers are discussed below. In one or more embodiments of the invention, classification of documents by the second classifier may be performed using a same or similar technique as discussed above with reference to Step 301 of FIG. 3.
- a recall metric is estimated based on the first set of resulting documents and the second set of resulting documents in accordance with one or more embodiments of the invention.
- one or more embodiments of the invention perform recall estimation on the first classifier using the results of the second classifier.
- Recall estimation may be performed using the following equation for two independent classifiers (e.g., classifier 1, classifier 2) performing a two-way classification:
- r recall estimation
- Ai is the set of documents classified in the class by classifier 1
- a 2 is the set of documents classified in the class by classifier 2
- a 12 is the set of documents classified in the class by both classifiers
- pi is the precision of classifier 1
- p 2 is the precision of classifier 2
- U is the set of all documents in the collection.
- the above equation may be modified for additional classifiers, multi-way classifiers, or classifiers not having as much independence.
- Step 311 a determination is made whether the recall metric satisfies a threshold in accordance with one or more embodiments of the invention. If the recall metric satisfies the threshold, then the flow may proceed to end. In other words, the communication may be deemed to be sent to all targeted users. In one or more embodiments of the invention, if the amount does not satisfy the threshold, the flow may proceed to Step 313.
- the first classifier is revised based on the amount to obtain a revised classifier in accordance with one or more embodiments of the invention. Revising the first classifier may include adding the attributes of the second classifier to the first classifier, retraining the first classifier using additional training samples or performing another action.
- Step 315 the documents are classified using the revised classifier in accordance with one or more embodiments of the invention.
- classifying documents by the revised classifier may be performed using a same or similar technique as discussed above with reference to Step 301 of FIG. 3.
- a revised classifier classifies the documents.
- the revised classifier may use the additional attributes or attribute values.
- Step 317 any additional users are identified based on the classified documents in accordance with one or more embodiments of the invention. Identifying additional users may be performed using a same or similar technique as discussed above with reference to Step 303 of FIG. 3. In other words, using the revised classification results, additional users are identified.
- Step 319 the communication is sent to the additional users in accordance with one or more embodiments of the invention. Sending the communication may be performed using a same or similar technique as discussed above with reference to Step 305 of FIG. 3. In one or more embodiments of the invention, rather than sending the communication to all users, the communication is sent only to the additional users added by revising the classifier.
- FIG. 4 shows a flowchart for classifying documents, such as for indexing purposes.
- the corpus of document may be virtually any type of document, such as the documents discussed above with reference to FIG. 2.
- the documents are classified to obtain, in an index, a first set of resulting documents in a class in accordance with one or more embodiments of the invention.
- the classifier identifies a document and extracts the attribute values of the document corresponding to the attributes in the attribute set of the classifier. By comparing the attribute values to the attributes of the classes, by which the classifier is trained, the classifier determines to which class the document should be added. The classifier may then add the document to an index accordingly.
- the classifier repeats the process for each document.
- the classifier operates as a single thread serially. In other embodiments, the classifier may have multiple threads or multiple application instances that classify documents in parallel.
- Step 403 documents are classified using a second classifier to obtain, in an index, a second set of resulting documents that are in the class in accordance with one or more embodiments of the invention.
- Selecting the second classifier may be performed using the technique described above with reference to Step 307 of FIG. 3.
- classification of documents by the second classifier may be performed using a same or similar technique as discussed above with reference to Step 401 of FIG. 4.
- Step 405 a recall metric is estimated based on the first set of resulting documents and the second set of resulting documents in accordance with one or more embodiments of the invention.
- Step 407 a determination is made whether the recall metric satisfies a threshold in accordance with one or more embodiments of the invention. If the recall metric satisfies the threshold, then the flow may proceed to end. In other words, the communication may be deemed to be sent to all targeted users. In one or more embodiments of the invention, if the amount does not satisfy the threshold, the flow may proceed to Step 409.
- Step 413 the first classifier is revised based on the amount to obtain a revised classifier in accordance with one or more embodiments of the invention.
- Step 415 the documents are classified using the revised classifier in accordance with one or more embodiments of the invention.
- Steps 405, 407, 409, and 411 may be respectively performed using the technique described above with reference to Step 309, 311, 313, and 315 of FIG. 3.
- a request for communication is received from a user in accordance with one or more embodiments of the invention.
- a user may request a set of documents that correspond to a class.
- the user may request documents that are indexed according to a particular topic.
- the user may request that analysis is performed based on the classes associated with documents.
- the analysis may be the percentages and types of documents that are in a class.
- Step 417 classification results are obtained in response to the request.
- the analysis requested by the user is performed, the documents and/or identifiers of documents that are indexed with the class are obtained, or another set of classification results are obtained.
- Step 419 the communication is sent to the user in accordance with one or more embodiments of the invention.
- the communication is generated with the requested information and sent to the user.
- Sending the communication may be performed using various techniques, such as via an electronic mail message, in a web page, as a page in a web application, or using any other techniques.
- sending a communication may be performed by partitioning a message into packets, and sending the packets on a network that connects various physical devices.
- a keyword filter is a special type of classifier for which a document is marked as on-topic if any of the words in the filter appear in the document.
- a seed term to be a single word or phrase that captures the topic of interest in the broadest generality possible. Some examples of seed terms would be sports, Mars or Obama.
- the classifiers are defined as follows: the first classifier in the pair, C seed , filters on the terms seed; #seed (in the case of social media messages) while the second filter, C nei ghbors > filters on similar neighbors of the seed terms.
- a number of different scores may be used for measuring overlap. Letting A seed and A y denote the documents found by the seed term filter and a candidate keyword y , respectively, one such similarity score is the Jaccard similarity using the equation below.
- a first example of seed terms are [apple;#apple] and the similarity neighbors are [#ios6, #ipad3, #iphone, hack,macintosh, iphones, #siri, ios, macbook, icloud, ipad, Samsung, #ipodtouch, 4s, itunes, cydia, cider, #gadget, #tech, #tablet, app, connector, #mac,...].
- a second example of seed terms are [mars;#mars] and the similarity neighbors are [rover, nasa, #curiosity, #curiousity, image, mission, surface, #curiosityrover, bruno, milky, budget, @marscuriosity, gale, crater, orbiting, successfully, lands, landing, spectacular . . . ].
- a third example of seed terms are [obama;#obama] and the similarity neighbors are [@barackobama, barack, bush, #mitt2012, #obama2012, obamas, #dems, #gop, #military, romney, #idontsupportobama, potus, #president, administration, pres, #politics, voting . . . ].
- a fourth example of seed terms are [Olympics ;#olympics] and the similarity neighbors are [medalist, gold, london, gb, kirani, gymnast, kate, sprinter, winning, won, #boxing, soccer, watch, watching, #usa, #teamgb, #canada, #london, javelin, nbc, match, 2012, 400m . . . ].
- Another similarity score is the asymmetric overlap measure, using the following equation.
- the top scoring neighbors (leaving out seed terms) are included in the filter defining C ne i gllbors . Although some terms appearing in the list of neighbors may not be not on-topic, both filters have overall precisions that are 'reasonable'. In some embodiments, the precision in the denominators of the above equations are only required not to be so small as to make the computation unstable in addition to the conditional independence assumptions.
- threshold-based classifiers and a social media classifier pair satisfying the conditional independence property may be considered.
- classifiers induced over a training corpus that produce a real-valued score when evaluated over an out-of-sample datum are considered.
- a document is labeled as on- topic.
- Many commonly used machine-learning techniques fall into this category, including logistic regression, SVM or Naive Bayes.
- the classifier may be limited to social media messages which also include links to documents on the web.
- a conditionally independent pair may be constructed as follows: one classifier is trained and runs entirely on social media message text while the other is based entirely on the web documents that the social media message links to. In the training, the processes generating each description/web-document pair are independent of each other. In some embodiments, only the conditional independence is held in an approximate sense.
- the second classifier, C2 is based entirely on social signals.
- the social signals may be composed of authorship (an author who is primarily known for technology articles would have all of his/her social media messages classified by C2 as on-topic for the topic technology).
- @mentions may provide additional social signals. For instance, knowing that a social media message has been sent as a reply to @barackobama may provide a signal that the social media message is about politics. Due to the fact that the signals used by each of these classifiers have independent origins, conditional independence may hold here as well.
- n est may be formed using the below. That is, an estimate on the total number of positives, n est may be formed. That is, the estimate n est may be computed as follows:
- the following example is for explanatory purposes only and not intended to limit the scope of the invention.
- classification is performed to classify which users would be appropriate to target for a particular advertisement campaign.
- the advertisement campaign is meant to reach a small number of users, such as the targeting based on niche products.
- One classifier may be based, for example, on the corpus of social media messages produced, while another may be based on the users the account follows.
- the two classifiers may be used together to determine the recall metric of each classifier (e.g., the fraction of all appropriate targeting candidates each classifier was able to find).
- classifier X may classify based on the user profiles. Namely, classifier X searches for particular attributes in the user profiles that may be an indication of how affluent each user is. Classifier X stores the user identifiers of identified users that have at least a particular income or net worth level in classifier X results. Classifier Y searches for particular keywords in the user profiles that may be an indication of how affluent each user is and stores identified users that have at least the particular income or net worth level in classifier Y results.
- the classifier X results may be used by the analysis module with classifier Y results to estimate the recall of classifier Y results and, thus, the percentage of target users missed by classifier Y.
- the percentage of recall is 30%, then the advertisement disperser may miss 70% of users that may be interested in the luxury car.
- the estimated recall may be used to revise classifier Y creating a revised classifier.
- classifier Y may be retrained with an additional training set, may have additional keywords added, may have additional or alternative types of documents added, and/or may be combined with classifier X. If classifier X and classifier Y are combined to create the revised classifier, the recall and precision may be estimated based on the existing classifier X results and classifier Y results in accordance with one or more embodiments of the invention.
- the quality of the estimated recall may be dependent on the degree of independence between classifier X and classifier Y and/or the precision of the classifier X results and the classifier Y results.
- the quality of the estimate may be calculated by calculating the degree of independence between the classifier and/or results and/or by estimating the precision. With a greater degree of independence (i.e., more independent) and/or greater precision, the quality increases.
- One or more embodiments of the invention may be applied to the social media analysis as follows. Creation of social media updates (e.g., social media messages, status posts) and the volume and/or availability of the updates has resulted in usage for a variety of predictive tasks.
- the predictive tasks may include analyzing the sentiment around products and brands (and resulting potential sales), inferring the audience for sporting events, estimating popularity of politicians, etc.
- one or more embodiments may define independent classifiers that classify social media streams into topics.
- One or more embodiments may then estimate recall using the technique described above in order to determine the performance of one or both classifiers.
- a classifier may be updated and used reclassified the social media updates.
- the results of the classification may be used to generate metrics describing the social media updates in accordance with the above predictive tasks.
- the social media metrics may be described in a communication to one or more users, such as another company that uses the results to perform operations.
- FIG. 5 shows an example, in accordance with one or more embodiments of the invention.
- FIG. 5 shows an example Venn diagram.
- the region having the set of all documents is labeled as region U (500) and is the entire rectangle.
- the region having the set of documents actually belonging to the class is the circle labeled region T (502).
- the region having the set of documents classified by classifier Ci as belonging to the class is the circle labeled region A t (504).
- the region having the set of documents classified by classifier C 2 as belonging to the class is the circle labeled region A 2 (506).
- the recall ⁇ of C 1 may be estimated using the following equation:
- the above equation and FIG. 5 shows the amount to which A 2 , the classifier CI is able to find. If CI and C2 are sufficiently independent, the estimate using the above equation reliably conveys the recall r ⁇ Rather than independence over random draws from the entire set of documents, independence over the on-topic and on-topic subsets of U which makes the intuition of the above equation valid in accordance with one or more embodiments of the invention.
- Embodiments of the invention may be implemented on a computing system. Any combination of mobile, desktop, server, embedded, or other types of hardware may be used. For example, as shown in FIG.
- the computing system (600) may include one or more computer processor(s) (602), associated memory (604) (e.g., random access memory (RAM), cache memory, flash memory, etc.), one or more storage device(s) (606) ⁇ e.g., a hard disk, an optical drive such as a compact disk (CD) drive or digital versatile disk (DVD) drive, a flash memory stick, etc.), and numerous other elements and functionalities.
- the computer processor(s) (602) may be an integrated circuit for processing instructions.
- the computer processor(s) may be one or more cores, or micro-cores of a processor.
- the computing system (600) may also include one or more input device(s) (610), such as a touchscreen, keyboard, mouse, microphone, touchpad, electronic pen, or any other type of input device. Further, the computing system (600) may include one or more output device(s) (608), such as a screen ⁇ e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device), a printer, external storage, or any other output device. One or more of the output device(s) may be the same or different from the input device(s).
- input device(s) such as a touchscreen, keyboard, mouse, microphone, touchpad, electronic pen, or any other type of input device.
- output device(s) such as a screen ⁇ e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device
- a printer external storage, or any other output device.
- the computing system (600) may be connected to a network (612) ⁇ e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, mobile network, or any other type of network) via a network interface connection (not shown).
- the input and output device(s) may be locally or remotely ⁇ e.g., via the network (612)) connected to the computer processor(s) (602), memory (604), and storage device(s) (606).
- LAN local area network
- WAN wide area network
- the input and output device(s) may be locally or remotely ⁇ e.g., via the network (612)) connected to the computer processor(s) (602), memory (604), and storage device(s) (606).
- Software instructions in the form of computer readable program code to perform embodiments of the invention may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium such as a CD, DVD, storage device, a diskette, a tape, flash memory, physical memory, or any other computer readable storage medium.
- the software instructions may correspond to computer readable program code that when executed by a processor(s), is configured to perform embodiments of the invention.
- one or more elements of the aforementioned computing system (600) may be located at a remote location and connected to the other elements over a network (612).
- embodiments of the invention may be implemented on a distributed system having a plurality of nodes, where each portion of the invention may be located on a different node within the distributed system.
- the node corresponds to a distinct computing device.
- the node may correspond to a computer processor with associated physical memory.
- the node may alternatively correspond to a computer processor or micro-core of a computer processor with shared memory and/or resources.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| BR112017009015A BR112017009015A2 (en) | 2014-10-28 | 2015-10-28 | classifier sensitivity estimate for sparse topics |
| KR1020207034677A KR102262953B1 (en) | 2014-10-28 | 2015-10-28 | Classifier recall estimation for sparse topics |
| EP15854422.1A EP3213235A4 (en) | 2014-10-28 | 2015-10-28 | Classifier recall estimation for sparse topics |
| CA2966350A CA2966350A1 (en) | 2014-10-28 | 2015-10-28 | Classifier recall estimation for sparse topics |
| KR1020197015145A KR102188337B1 (en) | 2014-10-28 | 2015-10-28 | Classifier recall estimation for sparse topics |
| KR1020177014481A KR101984773B1 (en) | 2014-10-28 | 2015-10-28 | Classifier recall estimation for sparse topics |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201462069618P | 2014-10-28 | 2014-10-28 | |
| US62/069,618 | 2014-10-28 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2016069767A1 true WO2016069767A1 (en) | 2016-05-06 |
Family
ID=55858314
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2015/057851 WO2016069767A1 (en) | 2014-10-28 | 2015-10-28 | Classifier recall estimation for sparse topics |
Country Status (6)
| Country | Link |
|---|---|
| EP (1) | EP3213235A4 (en) |
| KR (3) | KR102188337B1 (en) |
| BR (1) | BR112017009015A2 (en) |
| CA (1) | CA2966350A1 (en) |
| DE (1) | DE202015009963U1 (en) |
| WO (1) | WO2016069767A1 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR102093839B1 (en) | 2019-05-28 | 2020-05-04 | (주)다림티센 | Hemostatic agent and container containing the same |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060206443A1 (en) * | 2005-03-14 | 2006-09-14 | Forman George H | Method of, and system for, classification count adjustment |
| US20100114654A1 (en) * | 2008-10-31 | 2010-05-06 | Hewlett-Packard Development Company, L.P. | Learning user purchase intent from user-centric data |
| US20120116875A1 (en) * | 2010-11-05 | 2012-05-10 | Microsoft Corporation | Providing advertisements based on user grouping |
| WO2013025959A2 (en) * | 2011-08-18 | 2013-02-21 | Audax Health Solutions, Inc. | Methods and apparatus for classifying content |
| US8713007B1 (en) * | 2009-03-13 | 2014-04-29 | Google Inc. | Classifying documents using multiple classifiers |
-
2015
- 2015-10-28 KR KR1020197015145A patent/KR102188337B1/en not_active Expired - Fee Related
- 2015-10-28 EP EP15854422.1A patent/EP3213235A4/en not_active Ceased
- 2015-10-28 KR KR1020207034677A patent/KR102262953B1/en not_active Expired - Fee Related
- 2015-10-28 CA CA2966350A patent/CA2966350A1/en active Pending
- 2015-10-28 DE DE202015009963.7U patent/DE202015009963U1/en not_active Expired - Lifetime
- 2015-10-28 WO PCT/US2015/057851 patent/WO2016069767A1/en active Application Filing
- 2015-10-28 BR BR112017009015A patent/BR112017009015A2/en not_active Application Discontinuation
- 2015-10-28 KR KR1020177014481A patent/KR101984773B1/en not_active Expired - Fee Related
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060206443A1 (en) * | 2005-03-14 | 2006-09-14 | Forman George H | Method of, and system for, classification count adjustment |
| US20100114654A1 (en) * | 2008-10-31 | 2010-05-06 | Hewlett-Packard Development Company, L.P. | Learning user purchase intent from user-centric data |
| US8713007B1 (en) * | 2009-03-13 | 2014-04-29 | Google Inc. | Classifying documents using multiple classifiers |
| US20120116875A1 (en) * | 2010-11-05 | 2012-05-10 | Microsoft Corporation | Providing advertisements based on user grouping |
| WO2013025959A2 (en) * | 2011-08-18 | 2013-02-21 | Audax Health Solutions, Inc. | Methods and apparatus for classifying content |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3213235A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CA2966350A1 (en) | 2016-05-06 |
| KR102262953B1 (en) | 2021-06-09 |
| KR20200139843A (en) | 2020-12-14 |
| KR20190062613A (en) | 2019-06-05 |
| BR112017009015A2 (en) | 2017-12-26 |
| KR101984773B1 (en) | 2019-06-03 |
| KR20170139492A (en) | 2017-12-19 |
| KR102188337B1 (en) | 2020-12-09 |
| EP3213235A1 (en) | 2017-09-06 |
| EP3213235A4 (en) | 2018-01-03 |
| DE202015009963U1 (en) | 2022-01-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10936959B2 (en) | Determining trustworthiness and compatibility of a person | |
| US10037320B2 (en) | Context-aware approach to detection of short irrelevant texts | |
| US11310338B1 (en) | Method and system for topic disambiguation and classification | |
| US8832188B1 (en) | Determining language of text fragments | |
| US10839013B1 (en) | Generating a graphical representation of relationships among a set of articles and information associated with the set of articles | |
| US20150081725A1 (en) | System and method for actively obtaining social data | |
| US10387915B2 (en) | Digital magazine recommendations by topic | |
| US20140358630A1 (en) | Apparatus and process for conducting social media analytics | |
| US9787785B2 (en) | Providing recommendations for electronic presentations based on contextual and behavioral data | |
| US9846746B2 (en) | Querying groups of users based on user attributes for social analytics | |
| US10990620B2 (en) | Aiding composition of themed articles about popular and novel topics and offering users a navigable experience of associated content | |
| US11108717B1 (en) | Trends in a messaging platform | |
| US9824149B2 (en) | Opportunistically solving search use cases | |
| US11086905B1 (en) | Method and system for presenting stories | |
| EP3186744A1 (en) | Spam detection for online slide deck presentations | |
| Cui et al. | Personalized microblog recommendation using sentimental features | |
| KR102188337B1 (en) | Classifier recall estimation for sparse topics | |
| US10868872B2 (en) | Method and system for determining a source link to a source object | |
| Nugroho et al. | Matrix inter-joint factorization-a new approach for topic derivation in twitter | |
| US20150026266A1 (en) | Share to stream | |
| US20190068535A1 (en) | Self-healing content treatment system and method | |
| US10592539B1 (en) | Trends in a messaging platform |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15854422 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2966350 Country of ref document: CA |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| REEP | Request for entry into the european phase |
Ref document number: 2015854422 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 20177014481 Country of ref document: KR Kind code of ref document: A |
|
| REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112017009015 Country of ref document: BR |
|
| ENP | Entry into the national phase |
Ref document number: 112017009015 Country of ref document: BR Kind code of ref document: A2 Effective date: 20170428 |