CA3190303A1 - System and method for addressing disinformation - Google Patents
System and method for addressing disinformationInfo
- Publication number
- CA3190303A1 CA3190303A1 CA3190303A CA3190303A CA3190303A1 CA 3190303 A1 CA3190303 A1 CA 3190303A1 CA 3190303 A CA3190303 A CA 3190303A CA 3190303 A CA3190303 A CA 3190303A CA 3190303 A1 CA3190303 A1 CA 3190303A1
- Authority
- CA
- Canada
- Prior art keywords
- news
- processor
- executed
- article
- score
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Control Of Indicators Other Than Cathode Ray Tubes (AREA)
Abstract
A system for addressing disinformation in news includes a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to display, on a display of an electronic device, a content feed comprising a series of news items, such as news articles and other news media, received from one or more news organizations, and restrict the electronic device from publishing news articles to the content without prior authorization, such as vetting.
Description
BACKGROUND
1. Field [0001]
The present disclosure relates generally to systems and methods for addressing disinformation on social media.
1. Field [0001]
The present disclosure relates generally to systems and methods for addressing disinformation on social media.
2. Description of Related Art [0002]
In recent years, misinformation (e.g., disinformation) has been on the rise with the widespread use of digital media. Advancements in digital technology have enabled bad actors and/or automated systems (e.g., bots) to spread a form of news including deliberate disinformation or hoaxes (i.e., fake news).
This form of misinformation tends to damage the subject of the disinformation for financial and/or political gain by using sensationalist, dishonest, or outright fabricated stories, headlines, images, and/or videos. With the rise of social media consumption generally, disinformation has increasingly gone "viral" (e.g., spread rapidly with a number of individuals) and found its way into mainstream media causing widespread harm.
SUMMARY
In recent years, misinformation (e.g., disinformation) has been on the rise with the widespread use of digital media. Advancements in digital technology have enabled bad actors and/or automated systems (e.g., bots) to spread a form of news including deliberate disinformation or hoaxes (i.e., fake news).
This form of misinformation tends to damage the subject of the disinformation for financial and/or political gain by using sensationalist, dishonest, or outright fabricated stories, headlines, images, and/or videos. With the rise of social media consumption generally, disinformation has increasingly gone "viral" (e.g., spread rapidly with a number of individuals) and found its way into mainstream media causing widespread harm.
SUMMARY
[0003] The present disclosure relates to various embodiments of a system for addressing digital disinformation in news. In one embodiment, the system includes a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to display, on a display of an electronic device, a content feed including a series of news items received from one or more news organizations, and restrict the electronic device from publishing news articles to the content feed.
[0004]
The news items may include a news article, a news video, and/or a news image.
The news items may include a news article, a news video, and/or a news image.
[0005]
The instructions, when executed by the processor, may further cause the processor to display the series of new items received only from a white list of news organizations.
The instructions, when executed by the processor, may further cause the processor to display the series of new items received only from a white list of news organizations.
[0006]
The instructions, when executed by the processor, may further cause the processor to send a notification, to the electronic device, in response to a determination that a news items of the series of news items that was displayed on the display of the electronic device contains disinformation.
The instructions, when executed by the processor, may further cause the processor to send a notification, to the electronic device, in response to a determination that a news items of the series of news items that was displayed on the display of the electronic device contains disinformation.
[0007]
The instructions, when executed by the processor, may further cause the processor to send a notification, to the electronic device, in response to a new news item being published by a publisher.
1 [0008] The instructions, when executed by the processor, may further cause the processor to display a comment, a "like," a share, or a play associated with a news item of the series of news items received from the electronic device.
[0009] In another embodiment, the system includes a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to determine, utilizing a bias detection model, a bias score for each component of a news item from a news organization, and display, on a display of an electronic device, components of the news item having a bias score lower than a threshold value.
[0010] The news item may be a news article, and the components may be a series of sentences of the news article.
[0011] The instructions, when executed by the processor, may cause the processor to split the article into the series of sentences, and determine an emotion associated with each sentence of the series of sentences.
[0012] The instructions, when executed by the processor, may cause the processor to determine a polarity in a range from -1 to 1 to determine the emotion associated with each sentence of the series of sentences in which 1 is a positive statement and -1 is a negative statement.
[0013] The instructions, when executed by the processor, may cause the processor to determine a subjectivity score in a range from 0 to 1 to determine the emotion associated with each sentence of the plurality of sentences in which 0 is a factual statement and 1 is a personal opinion.
[0014] The instructions, when executed by the processor, may further cause the processor to recursively apply the bias detection model to the article including a first application of the bias detection model and a second application of the bias detection model, remove at least one sentence of the article before the first application of the bias detection model, reintroduce the at least one sentence to the article after the first application of the bias detection model and before the second application of the bias detection model, and determine a change in the bias score between the first application of the bias detection model and the second application of the bias detection model.
[0015] In a further embodiment, the system includes a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to determine, utilizing a claim detection model, a claim score for each component of a news item from a news organization, and output components of the news item having a claim score lower than a threshold value.
1 [0016] The news item may be a news article, and the components may include a series of sentences of the news article.
[0017] The instructions, when executed by the processor, may cause the processor to request articles from a fake news corpus to determine the claim score.
[0018] In another embodiment, the system includes a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to determine, utilizing a hate speech detection model, a hate score for each component of a news item from a news organization, and output components of the news item having a hate score lower than a threshold value.
[0019] The news item may be a news article, and the components may include a series of sentences of the news article.
[0020] The instructions, when executed by the processor, may cause the processor to request articles from a hate news corpus to determine the hate score.
[0021] In yet another embodiment, the system includes a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to determine, utilizing a summarizer model, a summary score for each component of a news item from a news organization, and output components of the news item having a summary score lower than a threshold value.
[0022] The news item may be a news article, and the components may include a series of sentences of the news article.
[0023] The instructions, when executed by the processor, may cause the processor to punctuate, with a post processing pipeline, the output sentences.
[0024] This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in limiting the scope of the claimed subject matter. One or more of the described features may be combined with one or more other described features to provide a workable device, system, or method.
BRIEF DESCRIPTION OF THE DRAWINGS
[0025] The patent or application file contains at least one drawing executed in color.
Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
[0026] The features and advantages of embodiments of the present disclosure will become more apparent by reference to the following detailed description when considered in conjunction with the following drawings. In the drawings, like reference 1 numerals are used throughout the figures to reference like features and components.
The figures are not necessarily drawn to scale.
[0027] FIG. 1 is a block diagram of a system for processing and communicating a data feed according to one embodiment of the present disclosure;
[0028] FIG. 2 is a flowchart illustrating tasks of a method of detecting bias in an article according to one embodiment of the present disclosure;
[0029] FIG. 3 is a flowchart illustrating tasks of a method of analyzing claims in an article according to one embodiment of the present disclosure;
[0030] FIG. 4 is a flowchart illustrating tasks of a method of detecting hate speech in an article according to one embodiment of the present disclosure;
[0031] FIG. 5 is a flowchart illustrating tasks of a method of summarizing an article according to one embodiment of the present disclosure;
[0032] FIGS. 6 and 7 are screenshots of a data feed of a platform and a feature enabling a business, a network, or other data provider to send "push"
notifications to subscribers, respectively, according to one embodiment of the present disclosure; and [0033] FIGS. 8A-8G depict screenshots of a mobile application incorporating the algorithms and other functionality of the present disclosure.
DETAILED DESCRIPTION
[0034] The present disclosure is directed to various embodiments of systems and methods for a closed social networking platform including a data architecture that has a finite number of options or choices to reduce the spread of disinformation between one or more users, media broadcasts, videos, news articles, etc. The systems and methods of the present disclosure include providing a data feed including media broadcasts, videos, and/or news articles, etc. where users are limited to a finite set of options (e.g., commenting and/or "liking") and are unable to take any action that impacts the data feed such as publishing articles to the data feed or altering the published content on the data feed, unless previously verified or otherwise authorized (e.g., vetted or screened). The finite set of options reduces the number of interactions for review and categorization of the interactions based on reduced variability and increased predictability. The systems and method of the present disclosure include one or more data algorithms processing content on the data feed and/or interactions.
In one or more embodiments, the data algorithms including a bias detection method, a claim detection method, a hate speech detection method, a summarizer, and other digital media content-related algorithms, among other things. These algorithms and methodologies may be applied to, or incorporated in, any suitable system or platform (e.g., these algorithms and methodologies may be incorporated into a third-party platform).
1 [0035] FIG. 1 is a block diagram of a system 100 for processing and communicating a data feed 104 over a data network 102 according to one embodiment of the present disclosure. The system 100 includes a server 106, one or more electronic devices 108 operated by one or more corresponding users 110, and a data provider 112.
The one or more users may be participants in the system 100 for processing and communicating a data feed 104 that combats or reduces disinformation. The one or more users may operate the electronic devices 108 to view and interact with the data feed 104. The number of electronic devices 108 and users 110 may vary according to the design of the server 106 and the system 100, and are not limited to the number illustrated in FIG. 1.
[0036] In one or more embodiments, the server 106 is connected to (i.e., in electronic communication with) a plurality of electronic devices 108 over a data network 102 such as, for example, a local area network or a wide area network (e.g., a public Internet). The server 106 includes one or more software modules 109 for coordinating communications and interactions between the users 110, determining the data feed 104, and applying one or more algorithms directed toward bias detection, hate speech detection, article summarization, and claim checking, among other features, data processes, or algorithms directed toward combating disinformation on social media. The algorithms will be described in more detail below.
[0037] In one or more embodiments, the server 106 includes a mass storage device or database 114 such as, for example, a disk drive, drive array, flash memory, magnetic tape, or other suitable mass storage device for storing information used by the server 106. For example, the database 114 may store personal profile information (e.g., a "handle") about the users 110, interactions between the users 110, interactions between a user and a corresponding data feed 104, business/network data, content 116 from a data provider 112, and/or analysis results (e.g., preprocessed data) based on a bias detection method, a claim detection method, a hate speech detection method, an article summarization method, and/or other digital media content related algorithms, among other things. In one or more embodiments, the database may store any other relevant information for facilitating interactions between users 110, determining a data feed 104, and providing a data feed 104. Although the database 114 is included in the server 106 as illustrated in FIG. 1, in one or more embodiments, the server 106 may be connected to an external database that is not a part of the server 106, in which case, the database 114 may be used in addition to the external database or be omitted entirely.
[0038] The server 106 further includes a processor or central processing unit (CPU) 118, which executes program instructions from memory 120 and interacts with other system components to perform various methods and operations according to one or 1 more embodiments of the present invention. The memory 120 is implemented using any suitable memory device, such as a random access memory (RAM), and may additionally operate as a computer-readable storage medium having non-transitory computer readable instructions stored therein that when executed by a processor cause the processor to control and manage interactions and facilitate communications between users 110 using corresponding electronic devices 108, data providers providing content 116, analysis of content 116, and/or a data feed 104 over the data network 102.
[0039] The term "processor" is used herein to include any combination of hardware, firmware, and software, employed to process data or digital signals. The hardware of a processor may include, for example, application specific integrated circuits (ASICs), general purpose or special purpose central processors (CPUs), digital signal processors (DSPs), graphics processors (GPUs), and programmable logic devices such as field programmable gate arrays (FPGAs). In a processor, as used herein, each function is performed either by hardware configured, i.e., hard-wired, to perform that function, or by more general purpose hardware, such as a CPU, configured to execute instructions stored in a non-transitory storage medium. A processor may be fabricated on a single printed wiring board (PWB) or distributed over several interconnected PWBs. A processor may contain other processors, for example, a processor may include two processors, an FPGA and a CPU, interconnected on a PWB.
[0040] According to some embodiments of the present invention, the electronic devices 108 may connect to the data communications network 112 using a telephone connection, satellite connection, cable connection, radio frequency communication, or any suitable wired or wireless data communication mechanism. To this end, the electronic devices 108 may take the form of, for example, a personal computer (PC), hand-held personal computer (HPC), personal digital assistant (PDA), tablet or touch screen computer system, telephone, satellite network, cellular telephone, smartphone, an augmented reality system, a virtual reality system, an autonomous vehicle, or any suitable consumer electronics device.
[0041] In one or more embodiments, a data provider 112 publishes one or more items of content 116 including an article 130, a video 122, text 124, an image 126, and/or any other suitable content 128 (e.g., links to other articles, citations, metadata, and any other suitable elements in a piece or item of content). The data provider 112 may transmit or submit the one or more items of content 116 to the server 106 for processing. In one or more embodiments, the server 106 extracts or scrapes the one or more items of content 116 from the website hosting the one or more items of content 116 (e.g., via web scraping, web harvesting, web data extraction, etc.).
1 [0042] The server 106 may aggregate one or more items of content 116 from multiple data providers 112 and determine which items of content (e.g., news articles, media broadcasts, or other news content) are desirable for a data feed 104 that may be provided to one or more users 110 through one or more electronic devices 108. In some embodiments, the server 106 provides multiple data feeds 104 that are different from each other to each electronic device 108, and, in some embodiments, the server 106 provides a single shared data feed to multiple electronic devices 108. The data feed 104 includes broadcasts, videos, articles, blogs, messages, images, and/or other suitable communication media content for viewing by one or more users 110. In one or more embodiments, the content of the data feed 104 is only from data providers 112 having reliable journalism credentials such as articles sourced by national or international news organizations (e.g., a white list of news organizations).
In one or more embodiments, one or more items of content 116 are cross-referenced to determine credibility for inclusion in the data feed 104.
[0043] Accordingly, the content of the data feed 104 is determined solely by the server 106 and users 110 do not have the option to take any action that would impact the data feed 104 (or a particular data feed, such as the primary/trending data feed) such as publishing their own articles or otherwise modifying the content of the data feed 104, unless previously verified as a content provider or otherwise authorized (e.g., vetted or screened). That is, in one or more embodiments, the data feed 104 is a closed system that cannot be modified by the user, other than by the user subscribing to receive articles from particular news organizations. Due to this architecture, data provided to the users 110 is drastically limited which enables journalists, moderators, and algorithmic processes to effectively manage or monitor the content of the data feed 104 (i.e., the amount of content displayed on the data feed 104 is drastically reduced compared to an "open" system in which user can post their own content, such as articles, pictures, and videos, which enables effective moderation of the content on the data feed 104 and combats the threat of deep fakes or other data manipulation by outside software, bots, and actors). In one or more embodiments, data algorithms and other processes may be used to identify and eradicate the aforementioned types of content manipulation. Journalists and moderators may more easily identify inaccurate media content prior to and after publishing the data feed 104 compared to an "open" system in which users are freely able to post content (e.g., articles, videos, images). Additionally, a user may be "push"
notified if an article they viewed was proven via a fact check to be disinformation.
[0044] In some embodiments, the journalists and moderators may be live humans assisted by automated or semi-automated systems including algorithms detecting bias, hate speech, reliable claims, unreliable claims, inaccurate or misleading claims, 1 and/or any other suitable functions. In some embodiments, the functions performed by journalists and moderators are performed by automated systems.
[0045] In one or more embodiments, the system 100 provides the data feed 104 including media broadcasts, videos, news articles, and/or any other web-based content for viewing by one or more users 110. The data feed 104 may present a variety of content that may be prepared, reviewed, and/or selected by the server 106.
However, a user 110 viewing the data feed 104 is presented with a finite set of options.
In one or more embodiments, users are unable to take any action that impacts the data feed such as publishing their own articles or content as part of the data feed, unless previously verified as a content provider or otherwise authorized (e.g., vetted or screened). For example, a user may be limited to expressing approval (e.g., "like") a portion (e.g., articles, images, videos, etc.) of the data feed 104, comment on a portion of the data feed 104, play a video or translated audio version of the data feed (e.g., "watch" or "listen"), vote on a portion of the data feed 104 (e.g., rating an article based on any relevant factor, such as bias, trustworthiness, and/or relevancy), and/or ping (e.g., "share" with) another user, group, or audience with respect to a portion of the data feed 104 as part of the finite set of options provided to the users 110. In one or more embodiments, the users 110 options are not limited to "liking", "sharing", voting or rating, watching or playing, and/or commenting on portions of the data feed 104.
Furthermore, in one or more embodiments, users may be able to upload news articles, broadcasts, and/or other media content to their "stories." However, in one or more embodiments, the software module 109 does not permit user-generated content (e.g., user-generated articles) to be posted to the "stories," unless previously verified as a content provider or otherwise authorized (e.g., vetted or screened), although small snippets of text, images, video, or audio may be permitted to be written or posted in the "stories." The stories may be public or private. In one or more embodiments, public or private stories may be shared publicly or privately (e.g., a private story may be shared privately with a VIP list of friends/subscribers, and a public story may be shared with everyone). Furthermore, in one or more embodiments, multiple private lists of VIP
access may be created for sharing stories. In one or more embodiments, the software module 109 is configured to enable users to have a personal profile that includes their contact information and relevant links to other platforms, among other things.
Additionally, in one or more embodiments, the software module 109 may enable users to message each other.
[0046] In one or more embodiments, "sharing" a portion of the data feed 104 (e.g., an article, video, and/or media broadcast) may further include "group sharing." For example, a user may form a group comprising a plurality of users (e.g., a group of friends) by inviting others users to join or form a group. Each group may have one or
The instructions, when executed by the processor, may further cause the processor to send a notification, to the electronic device, in response to a new news item being published by a publisher.
1 [0008] The instructions, when executed by the processor, may further cause the processor to display a comment, a "like," a share, or a play associated with a news item of the series of news items received from the electronic device.
[0009] In another embodiment, the system includes a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to determine, utilizing a bias detection model, a bias score for each component of a news item from a news organization, and display, on a display of an electronic device, components of the news item having a bias score lower than a threshold value.
[0010] The news item may be a news article, and the components may be a series of sentences of the news article.
[0011] The instructions, when executed by the processor, may cause the processor to split the article into the series of sentences, and determine an emotion associated with each sentence of the series of sentences.
[0012] The instructions, when executed by the processor, may cause the processor to determine a polarity in a range from -1 to 1 to determine the emotion associated with each sentence of the series of sentences in which 1 is a positive statement and -1 is a negative statement.
[0013] The instructions, when executed by the processor, may cause the processor to determine a subjectivity score in a range from 0 to 1 to determine the emotion associated with each sentence of the plurality of sentences in which 0 is a factual statement and 1 is a personal opinion.
[0014] The instructions, when executed by the processor, may further cause the processor to recursively apply the bias detection model to the article including a first application of the bias detection model and a second application of the bias detection model, remove at least one sentence of the article before the first application of the bias detection model, reintroduce the at least one sentence to the article after the first application of the bias detection model and before the second application of the bias detection model, and determine a change in the bias score between the first application of the bias detection model and the second application of the bias detection model.
[0015] In a further embodiment, the system includes a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to determine, utilizing a claim detection model, a claim score for each component of a news item from a news organization, and output components of the news item having a claim score lower than a threshold value.
1 [0016] The news item may be a news article, and the components may include a series of sentences of the news article.
[0017] The instructions, when executed by the processor, may cause the processor to request articles from a fake news corpus to determine the claim score.
[0018] In another embodiment, the system includes a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to determine, utilizing a hate speech detection model, a hate score for each component of a news item from a news organization, and output components of the news item having a hate score lower than a threshold value.
[0019] The news item may be a news article, and the components may include a series of sentences of the news article.
[0020] The instructions, when executed by the processor, may cause the processor to request articles from a hate news corpus to determine the hate score.
[0021] In yet another embodiment, the system includes a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to determine, utilizing a summarizer model, a summary score for each component of a news item from a news organization, and output components of the news item having a summary score lower than a threshold value.
[0022] The news item may be a news article, and the components may include a series of sentences of the news article.
[0023] The instructions, when executed by the processor, may cause the processor to punctuate, with a post processing pipeline, the output sentences.
[0024] This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in limiting the scope of the claimed subject matter. One or more of the described features may be combined with one or more other described features to provide a workable device, system, or method.
BRIEF DESCRIPTION OF THE DRAWINGS
[0025] The patent or application file contains at least one drawing executed in color.
Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
[0026] The features and advantages of embodiments of the present disclosure will become more apparent by reference to the following detailed description when considered in conjunction with the following drawings. In the drawings, like reference 1 numerals are used throughout the figures to reference like features and components.
The figures are not necessarily drawn to scale.
[0027] FIG. 1 is a block diagram of a system for processing and communicating a data feed according to one embodiment of the present disclosure;
[0028] FIG. 2 is a flowchart illustrating tasks of a method of detecting bias in an article according to one embodiment of the present disclosure;
[0029] FIG. 3 is a flowchart illustrating tasks of a method of analyzing claims in an article according to one embodiment of the present disclosure;
[0030] FIG. 4 is a flowchart illustrating tasks of a method of detecting hate speech in an article according to one embodiment of the present disclosure;
[0031] FIG. 5 is a flowchart illustrating tasks of a method of summarizing an article according to one embodiment of the present disclosure;
[0032] FIGS. 6 and 7 are screenshots of a data feed of a platform and a feature enabling a business, a network, or other data provider to send "push"
notifications to subscribers, respectively, according to one embodiment of the present disclosure; and [0033] FIGS. 8A-8G depict screenshots of a mobile application incorporating the algorithms and other functionality of the present disclosure.
DETAILED DESCRIPTION
[0034] The present disclosure is directed to various embodiments of systems and methods for a closed social networking platform including a data architecture that has a finite number of options or choices to reduce the spread of disinformation between one or more users, media broadcasts, videos, news articles, etc. The systems and methods of the present disclosure include providing a data feed including media broadcasts, videos, and/or news articles, etc. where users are limited to a finite set of options (e.g., commenting and/or "liking") and are unable to take any action that impacts the data feed such as publishing articles to the data feed or altering the published content on the data feed, unless previously verified or otherwise authorized (e.g., vetted or screened). The finite set of options reduces the number of interactions for review and categorization of the interactions based on reduced variability and increased predictability. The systems and method of the present disclosure include one or more data algorithms processing content on the data feed and/or interactions.
In one or more embodiments, the data algorithms including a bias detection method, a claim detection method, a hate speech detection method, a summarizer, and other digital media content-related algorithms, among other things. These algorithms and methodologies may be applied to, or incorporated in, any suitable system or platform (e.g., these algorithms and methodologies may be incorporated into a third-party platform).
1 [0035] FIG. 1 is a block diagram of a system 100 for processing and communicating a data feed 104 over a data network 102 according to one embodiment of the present disclosure. The system 100 includes a server 106, one or more electronic devices 108 operated by one or more corresponding users 110, and a data provider 112.
The one or more users may be participants in the system 100 for processing and communicating a data feed 104 that combats or reduces disinformation. The one or more users may operate the electronic devices 108 to view and interact with the data feed 104. The number of electronic devices 108 and users 110 may vary according to the design of the server 106 and the system 100, and are not limited to the number illustrated in FIG. 1.
[0036] In one or more embodiments, the server 106 is connected to (i.e., in electronic communication with) a plurality of electronic devices 108 over a data network 102 such as, for example, a local area network or a wide area network (e.g., a public Internet). The server 106 includes one or more software modules 109 for coordinating communications and interactions between the users 110, determining the data feed 104, and applying one or more algorithms directed toward bias detection, hate speech detection, article summarization, and claim checking, among other features, data processes, or algorithms directed toward combating disinformation on social media. The algorithms will be described in more detail below.
[0037] In one or more embodiments, the server 106 includes a mass storage device or database 114 such as, for example, a disk drive, drive array, flash memory, magnetic tape, or other suitable mass storage device for storing information used by the server 106. For example, the database 114 may store personal profile information (e.g., a "handle") about the users 110, interactions between the users 110, interactions between a user and a corresponding data feed 104, business/network data, content 116 from a data provider 112, and/or analysis results (e.g., preprocessed data) based on a bias detection method, a claim detection method, a hate speech detection method, an article summarization method, and/or other digital media content related algorithms, among other things. In one or more embodiments, the database may store any other relevant information for facilitating interactions between users 110, determining a data feed 104, and providing a data feed 104. Although the database 114 is included in the server 106 as illustrated in FIG. 1, in one or more embodiments, the server 106 may be connected to an external database that is not a part of the server 106, in which case, the database 114 may be used in addition to the external database or be omitted entirely.
[0038] The server 106 further includes a processor or central processing unit (CPU) 118, which executes program instructions from memory 120 and interacts with other system components to perform various methods and operations according to one or 1 more embodiments of the present invention. The memory 120 is implemented using any suitable memory device, such as a random access memory (RAM), and may additionally operate as a computer-readable storage medium having non-transitory computer readable instructions stored therein that when executed by a processor cause the processor to control and manage interactions and facilitate communications between users 110 using corresponding electronic devices 108, data providers providing content 116, analysis of content 116, and/or a data feed 104 over the data network 102.
[0039] The term "processor" is used herein to include any combination of hardware, firmware, and software, employed to process data or digital signals. The hardware of a processor may include, for example, application specific integrated circuits (ASICs), general purpose or special purpose central processors (CPUs), digital signal processors (DSPs), graphics processors (GPUs), and programmable logic devices such as field programmable gate arrays (FPGAs). In a processor, as used herein, each function is performed either by hardware configured, i.e., hard-wired, to perform that function, or by more general purpose hardware, such as a CPU, configured to execute instructions stored in a non-transitory storage medium. A processor may be fabricated on a single printed wiring board (PWB) or distributed over several interconnected PWBs. A processor may contain other processors, for example, a processor may include two processors, an FPGA and a CPU, interconnected on a PWB.
[0040] According to some embodiments of the present invention, the electronic devices 108 may connect to the data communications network 112 using a telephone connection, satellite connection, cable connection, radio frequency communication, or any suitable wired or wireless data communication mechanism. To this end, the electronic devices 108 may take the form of, for example, a personal computer (PC), hand-held personal computer (HPC), personal digital assistant (PDA), tablet or touch screen computer system, telephone, satellite network, cellular telephone, smartphone, an augmented reality system, a virtual reality system, an autonomous vehicle, or any suitable consumer electronics device.
[0041] In one or more embodiments, a data provider 112 publishes one or more items of content 116 including an article 130, a video 122, text 124, an image 126, and/or any other suitable content 128 (e.g., links to other articles, citations, metadata, and any other suitable elements in a piece or item of content). The data provider 112 may transmit or submit the one or more items of content 116 to the server 106 for processing. In one or more embodiments, the server 106 extracts or scrapes the one or more items of content 116 from the website hosting the one or more items of content 116 (e.g., via web scraping, web harvesting, web data extraction, etc.).
1 [0042] The server 106 may aggregate one or more items of content 116 from multiple data providers 112 and determine which items of content (e.g., news articles, media broadcasts, or other news content) are desirable for a data feed 104 that may be provided to one or more users 110 through one or more electronic devices 108. In some embodiments, the server 106 provides multiple data feeds 104 that are different from each other to each electronic device 108, and, in some embodiments, the server 106 provides a single shared data feed to multiple electronic devices 108. The data feed 104 includes broadcasts, videos, articles, blogs, messages, images, and/or other suitable communication media content for viewing by one or more users 110. In one or more embodiments, the content of the data feed 104 is only from data providers 112 having reliable journalism credentials such as articles sourced by national or international news organizations (e.g., a white list of news organizations).
In one or more embodiments, one or more items of content 116 are cross-referenced to determine credibility for inclusion in the data feed 104.
[0043] Accordingly, the content of the data feed 104 is determined solely by the server 106 and users 110 do not have the option to take any action that would impact the data feed 104 (or a particular data feed, such as the primary/trending data feed) such as publishing their own articles or otherwise modifying the content of the data feed 104, unless previously verified as a content provider or otherwise authorized (e.g., vetted or screened). That is, in one or more embodiments, the data feed 104 is a closed system that cannot be modified by the user, other than by the user subscribing to receive articles from particular news organizations. Due to this architecture, data provided to the users 110 is drastically limited which enables journalists, moderators, and algorithmic processes to effectively manage or monitor the content of the data feed 104 (i.e., the amount of content displayed on the data feed 104 is drastically reduced compared to an "open" system in which user can post their own content, such as articles, pictures, and videos, which enables effective moderation of the content on the data feed 104 and combats the threat of deep fakes or other data manipulation by outside software, bots, and actors). In one or more embodiments, data algorithms and other processes may be used to identify and eradicate the aforementioned types of content manipulation. Journalists and moderators may more easily identify inaccurate media content prior to and after publishing the data feed 104 compared to an "open" system in which users are freely able to post content (e.g., articles, videos, images). Additionally, a user may be "push"
notified if an article they viewed was proven via a fact check to be disinformation.
[0044] In some embodiments, the journalists and moderators may be live humans assisted by automated or semi-automated systems including algorithms detecting bias, hate speech, reliable claims, unreliable claims, inaccurate or misleading claims, 1 and/or any other suitable functions. In some embodiments, the functions performed by journalists and moderators are performed by automated systems.
[0045] In one or more embodiments, the system 100 provides the data feed 104 including media broadcasts, videos, news articles, and/or any other web-based content for viewing by one or more users 110. The data feed 104 may present a variety of content that may be prepared, reviewed, and/or selected by the server 106.
However, a user 110 viewing the data feed 104 is presented with a finite set of options.
In one or more embodiments, users are unable to take any action that impacts the data feed such as publishing their own articles or content as part of the data feed, unless previously verified as a content provider or otherwise authorized (e.g., vetted or screened). For example, a user may be limited to expressing approval (e.g., "like") a portion (e.g., articles, images, videos, etc.) of the data feed 104, comment on a portion of the data feed 104, play a video or translated audio version of the data feed (e.g., "watch" or "listen"), vote on a portion of the data feed 104 (e.g., rating an article based on any relevant factor, such as bias, trustworthiness, and/or relevancy), and/or ping (e.g., "share" with) another user, group, or audience with respect to a portion of the data feed 104 as part of the finite set of options provided to the users 110. In one or more embodiments, the users 110 options are not limited to "liking", "sharing", voting or rating, watching or playing, and/or commenting on portions of the data feed 104.
Furthermore, in one or more embodiments, users may be able to upload news articles, broadcasts, and/or other media content to their "stories." However, in one or more embodiments, the software module 109 does not permit user-generated content (e.g., user-generated articles) to be posted to the "stories," unless previously verified as a content provider or otherwise authorized (e.g., vetted or screened), although small snippets of text, images, video, or audio may be permitted to be written or posted in the "stories." The stories may be public or private. In one or more embodiments, public or private stories may be shared publicly or privately (e.g., a private story may be shared privately with a VIP list of friends/subscribers, and a public story may be shared with everyone). Furthermore, in one or more embodiments, multiple private lists of VIP
access may be created for sharing stories. In one or more embodiments, the software module 109 is configured to enable users to have a personal profile that includes their contact information and relevant links to other platforms, among other things.
Additionally, in one or more embodiments, the software module 109 may enable users to message each other.
[0046] In one or more embodiments, "sharing" a portion of the data feed 104 (e.g., an article, video, and/or media broadcast) may further include "group sharing." For example, a user may form a group comprising a plurality of users (e.g., a group of friends) by inviting others users to join or form a group. Each group may have one or
-8-1 more users with administration privileges. Administration privileges allow a user to "share" to the entire group (i.e. "group share") a portion of the data feed 104. For example, a user with administrative privileges may "group share" to their group a news article among other things (e.g., a journalism lecture) presented by the data feed 104 and all of the members within the group may receive a "push" notification (e.g., a link or notification that a user may click or push to access set content) from the server 106.
In one or more embodiments, users with certain administration privileges will be able to see whether or not members within their group read or viewed the content sent via "group share." In one or more embodiments, once an article is shared with a friend or a group of friends, users will be able to message each other about the shared article.
In one or more embodiments, users will be able to message or notify their friends on the platform about a piece of content (e.g., an article, video, press release, etc.) without first sharing the content. Furthermore, in one or more embodiments, a user or data provider with administration privileges may be able to post to a "group story." The group story may be public or private. In one or more embodiments, public or private group stories may be shared publicly or privately (e.g., a private group story may be shared privately with a VIP list of friends/subscribers, and a public group story may be shared with everyone). Furthermore, in one or more embodiments, multiple private lists of VIP access may be created for sharing group stories. In one or more embodiments, the features and functionality available to users with administrative privileges in a group may be the same as the features and functionality available to users not in a group, except the features and functionality available to users with administrative privileges in a group may be targeted only to those users in the group (e.g., messaging, sharing, and/or posting stories only to other group members). The present disclosure is not limited to the manner of sharing articles or other data described above, and in one or more embodiments, articles or other data may be shared by users and/or data providers in any other suitable manner.
[0047] In one or more embodiments, the server 106 sends "push"
notifications when an article among other content is "shared" or "group shared" by another user, when another user "likes" or replies to a comment belonging to the user, and/or when a significant newsworthy event occurs among other reasons. "Push"
notifications are sent to the subscribers of the publications on the platform from the administrator of a publication's "page" and may hyperlink to a news article, article, video, or media broadcast, among other things. "Push" notifications may also be sent by the server 106 when bias, hate speech, reliable claims, unreliable claims, inaccurate or misleading claims, and/or any other suitable characteristics are later identified after a user has viewed the portion of the data feed 104. Accordingly, a user may receive an update regarding portions of the data feed 104 that the user previously viewed. These
In one or more embodiments, users with certain administration privileges will be able to see whether or not members within their group read or viewed the content sent via "group share." In one or more embodiments, once an article is shared with a friend or a group of friends, users will be able to message each other about the shared article.
In one or more embodiments, users will be able to message or notify their friends on the platform about a piece of content (e.g., an article, video, press release, etc.) without first sharing the content. Furthermore, in one or more embodiments, a user or data provider with administration privileges may be able to post to a "group story." The group story may be public or private. In one or more embodiments, public or private group stories may be shared publicly or privately (e.g., a private group story may be shared privately with a VIP list of friends/subscribers, and a public group story may be shared with everyone). Furthermore, in one or more embodiments, multiple private lists of VIP access may be created for sharing group stories. In one or more embodiments, the features and functionality available to users with administrative privileges in a group may be the same as the features and functionality available to users not in a group, except the features and functionality available to users with administrative privileges in a group may be targeted only to those users in the group (e.g., messaging, sharing, and/or posting stories only to other group members). The present disclosure is not limited to the manner of sharing articles or other data described above, and in one or more embodiments, articles or other data may be shared by users and/or data providers in any other suitable manner.
[0047] In one or more embodiments, the server 106 sends "push"
notifications when an article among other content is "shared" or "group shared" by another user, when another user "likes" or replies to a comment belonging to the user, and/or when a significant newsworthy event occurs among other reasons. "Push"
notifications are sent to the subscribers of the publications on the platform from the administrator of a publication's "page" and may hyperlink to a news article, article, video, or media broadcast, among other things. "Push" notifications may also be sent by the server 106 when bias, hate speech, reliable claims, unreliable claims, inaccurate or misleading claims, and/or any other suitable characteristics are later identified after a user has viewed the portion of the data feed 104. Accordingly, a user may receive an update regarding portions of the data feed 104 that the user previously viewed. These
-9-1 "push" notifications may be sent to the friends/subscribers of specific users, to the subscribers of a group of users that made a page, to the subscribers of particular networks, to subscribers of specific business pages, and/or to subscribers of verified journalists, media relations entities, industry professionals and experts (described below), among other things. Moreover, these "push" notifications may target a private or public group of users with any piece of content or data that provides value (e.g., a VIP list of subscribers, or multiple private lists of VIP access).
[0048] In one or more embodiments, neural network algorithms directed toward bias detection, hate speech detection, and/or claim detection may be applied by the server 106 as directed by the software module 109. For example, the software module 109 may manage a request from a user 110 or a data provider 112 to apply one or more algorithms. In response, the processor 118 of the server may execute instructions in the memory 120 corresponding to the request algorithms.
[0049] In one or more embodiments, the bias detection, hate speech detection, and/or claim detection employs word-level, sentence-level, and article-level analyses to judge the overall characteristics of a portion of the data feed 104 to determine whether it is likely to contain some bias, questionable claims, and hateful statements.
In one or more embodiments, the bias detection, hate speech detection, and/or claim detection algorithms learn from the wording of each sentence and from how sentences are contextually embedded in the portion of the data feed 104. The algorithms then specialize in capturing distinctive features that render the portion of the data feed 104 subjective. The algorithms may then provide an overall evaluation by performing sentence level analysis by backtracking the impact of each sentence on overall score to identify which sentences are most likely to cause the portion of the data feed 104 to be biased, hateful, reliable, unreliable, and/or factually inaccurate. In one or more embodiments, these sentences are provided to the users 110.
[0050] In one or more embodiments, the bias detection method includes applying a bias detection model to a selected portion of a data feed 104 (e.g., an article). The bias detection model may be trained based on preprocessing data and may be updated or retrained periodically.
[0051] In one or more embodiments, data for preprocessing is extracted from the database 114 and/or an external database. The data includes a dataset having text that has been reviewed and labeled for objectivity and/or bias. For example, a dataset may include one or more articles, videos, or media content (or reviewed text) where each article or item of content has been independently reviewed and labeled as having a determined degree of objectivity and/or degree of bias.
[0052] In one or more embodiments, preprocessing includes acquiring vector representations for each word in an individual piece of content (e.g., an article or video)
[0048] In one or more embodiments, neural network algorithms directed toward bias detection, hate speech detection, and/or claim detection may be applied by the server 106 as directed by the software module 109. For example, the software module 109 may manage a request from a user 110 or a data provider 112 to apply one or more algorithms. In response, the processor 118 of the server may execute instructions in the memory 120 corresponding to the request algorithms.
[0049] In one or more embodiments, the bias detection, hate speech detection, and/or claim detection employs word-level, sentence-level, and article-level analyses to judge the overall characteristics of a portion of the data feed 104 to determine whether it is likely to contain some bias, questionable claims, and hateful statements.
In one or more embodiments, the bias detection, hate speech detection, and/or claim detection algorithms learn from the wording of each sentence and from how sentences are contextually embedded in the portion of the data feed 104. The algorithms then specialize in capturing distinctive features that render the portion of the data feed 104 subjective. The algorithms may then provide an overall evaluation by performing sentence level analysis by backtracking the impact of each sentence on overall score to identify which sentences are most likely to cause the portion of the data feed 104 to be biased, hateful, reliable, unreliable, and/or factually inaccurate. In one or more embodiments, these sentences are provided to the users 110.
[0050] In one or more embodiments, the bias detection method includes applying a bias detection model to a selected portion of a data feed 104 (e.g., an article). The bias detection model may be trained based on preprocessing data and may be updated or retrained periodically.
[0051] In one or more embodiments, data for preprocessing is extracted from the database 114 and/or an external database. The data includes a dataset having text that has been reviewed and labeled for objectivity and/or bias. For example, a dataset may include one or more articles, videos, or media content (or reviewed text) where each article or item of content has been independently reviewed and labeled as having a determined degree of objectivity and/or degree of bias.
[0052] In one or more embodiments, preprocessing includes acquiring vector representations for each word in an individual piece of content (e.g., an article or video)
-10-1 in the dataset according to any suitable technique known in the art, for example, a technique for acquiring vector representations for words based on pre trained word embeddings is described in an article by Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova titled "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" (2018) (last revised May 24, 2019) available at https://arxiv.org/pdf/1810.04805.pdf, the entire content of which is incorporated herein by reference.
[0053] In one or more embodiments, preprocessing further includes splitting an article (or other pieces of content) in the dataset into words and/or sentences to carry out basic natural language processing operations using any suitable technique known in the art, for example, a method of splitting an article into words may be performed using a software library for advanced natural language processing capable of tokenizing text and other various functions.
[0054] After splitting a piece of content (e.g., an article) of the dataset into words and/or sentences, the emotion of the author of the words and/or sentences may be determined (e.g., on a sliding scale from negative to positive). The attitude or emotion of the author may be measured as a function of polarity and/or subjectivity.
Polarity refers to a float in the range of -1 to 1 where 1 is a positive statement and -1 is a negative statement. Subjectivity refers to a float in the range of 0 to 1 where 0 is a factual statement and 1 is a personal opinion, emotion, or judgment. In one or more embodiments, polarity and/or subjectivity are determined according to any suitable technique known in the art. As an example, a method of determining subjectivity and/or polarity may include using any software library for processing textual data which enables common natural language processing tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
[0055] In one or more embodiments, the preprocessed data from the dataset is formed into a stack including the source text, vector representation of the source text, subjectivity, and polarity. The stack formed from preprocessed data may be used to create a dictionary for the various features of each article (or reviewed text), and the dictionary may be appended to the preprocessed data in a database (e.g., database 114 and/or an external database) for future access. In this manner, the dictionary may be quickly consulted without having to preprocess the dataset again.
[0056] In one or more embodiments, preprocessing includes processing the text of the body of an article and the text of a title (e.g., an article headline) of the article as separate categories with both the title and the body relating to a single article.
Therefore, comparisons may be made between the title and the body of the article as discussed in more detail below.
[0053] In one or more embodiments, preprocessing further includes splitting an article (or other pieces of content) in the dataset into words and/or sentences to carry out basic natural language processing operations using any suitable technique known in the art, for example, a method of splitting an article into words may be performed using a software library for advanced natural language processing capable of tokenizing text and other various functions.
[0054] After splitting a piece of content (e.g., an article) of the dataset into words and/or sentences, the emotion of the author of the words and/or sentences may be determined (e.g., on a sliding scale from negative to positive). The attitude or emotion of the author may be measured as a function of polarity and/or subjectivity.
Polarity refers to a float in the range of -1 to 1 where 1 is a positive statement and -1 is a negative statement. Subjectivity refers to a float in the range of 0 to 1 where 0 is a factual statement and 1 is a personal opinion, emotion, or judgment. In one or more embodiments, polarity and/or subjectivity are determined according to any suitable technique known in the art. As an example, a method of determining subjectivity and/or polarity may include using any software library for processing textual data which enables common natural language processing tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
[0055] In one or more embodiments, the preprocessed data from the dataset is formed into a stack including the source text, vector representation of the source text, subjectivity, and polarity. The stack formed from preprocessed data may be used to create a dictionary for the various features of each article (or reviewed text), and the dictionary may be appended to the preprocessed data in a database (e.g., database 114 and/or an external database) for future access. In this manner, the dictionary may be quickly consulted without having to preprocess the dataset again.
[0056] In one or more embodiments, preprocessing includes processing the text of the body of an article and the text of a title (e.g., an article headline) of the article as separate categories with both the title and the body relating to a single article.
Therefore, comparisons may be made between the title and the body of the article as discussed in more detail below.
-11-1 [0057] Accordingly, several features are determined, stacked, and returned to the bias detection model (e.g., the main overall program or system that calls the various functions and receives data from those functions) by various preprocessing functions for modularity (i.e., each function is a separate module and gets the input, carries out its processing and returns the processed output to the main running program).
[0058] In one or more embodiments, the preprocessed data (e.g., data including the dictionary) in the database is consulted or extracted to train the bias detection model. The preprocessed data may include any number of media related content, for example, a single article or thousands of articles and/or videos, broadcasts, etc. The preprocessed data may be used as training data where the preprocessed data is fed into a main training loop. During the training loop, padding may be carried out on the training loop to make it uniform prior to feeding the padded data through a custom loader that creates batches on which training can be carried out.
[0059] In one or more embodiments, training includes using a linear attention class.
The linear attention class is configured to calculate the weighted vector summation according to the attention weight that each sentence pays to the title vector.
[0060] In one or more embodiments, training includes using a first convolutional neural network ("CNN") class (e.g., a time distributed CNN class) and a second CNN
class. The first CNN class is configured to combine the separate word level vectors into sentence level vectors by applying various filters to the input data. In one or more embodiments, the first CNN class only operates on the vector representations.
In other words, in one or more embodiments, the polarity and the subjectivity are not processed by the first CNN class. The second CNN class is configured to combine polarity and subjectivity from word level to sentence level. Accordingly, using the first CNN class and the second CNN class on the preprocessed data results in each sentence of the preprocessed data having a vector representation.
[0061] In one or more embodiments, training includes using a main model class CNN to combine sentence vectors into final vectors for classification. The main model class CNN calls on the attention class for calculating global attention with respect to the title vector. The forward function then passes the sentence vector into the main model class CNN architecture. The output of the main model class CNN is then concatenated with the reduced global attention vector, and the sentence level polarity and subjectivity. This is passed through a sigmoid and a bias score (e.g., a probability score) is determined. In one or more embodiments, a lower score indicates more bias while a higher score indicates less bias. In one or more embodiments, the determined bias score is compared to the bias score of a ground truth. The ground truth in this context may refer to bias scores determined based on previous bias detection models and databases comprising articles or other content that have been reviewed and
[0058] In one or more embodiments, the preprocessed data (e.g., data including the dictionary) in the database is consulted or extracted to train the bias detection model. The preprocessed data may include any number of media related content, for example, a single article or thousands of articles and/or videos, broadcasts, etc. The preprocessed data may be used as training data where the preprocessed data is fed into a main training loop. During the training loop, padding may be carried out on the training loop to make it uniform prior to feeding the padded data through a custom loader that creates batches on which training can be carried out.
[0059] In one or more embodiments, training includes using a linear attention class.
The linear attention class is configured to calculate the weighted vector summation according to the attention weight that each sentence pays to the title vector.
[0060] In one or more embodiments, training includes using a first convolutional neural network ("CNN") class (e.g., a time distributed CNN class) and a second CNN
class. The first CNN class is configured to combine the separate word level vectors into sentence level vectors by applying various filters to the input data. In one or more embodiments, the first CNN class only operates on the vector representations.
In other words, in one or more embodiments, the polarity and the subjectivity are not processed by the first CNN class. The second CNN class is configured to combine polarity and subjectivity from word level to sentence level. Accordingly, using the first CNN class and the second CNN class on the preprocessed data results in each sentence of the preprocessed data having a vector representation.
[0061] In one or more embodiments, training includes using a main model class CNN to combine sentence vectors into final vectors for classification. The main model class CNN calls on the attention class for calculating global attention with respect to the title vector. The forward function then passes the sentence vector into the main model class CNN architecture. The output of the main model class CNN is then concatenated with the reduced global attention vector, and the sentence level polarity and subjectivity. This is passed through a sigmoid and a bias score (e.g., a probability score) is determined. In one or more embodiments, a lower score indicates more bias while a higher score indicates less bias. In one or more embodiments, the determined bias score is compared to the bias score of a ground truth. The ground truth in this context may refer to bias scores determined based on previous bias detection models and databases comprising articles or other content that have been reviewed and
-12-1 labeled for objectivity and/or bias. The main model class CNN may use cross entropy loss and the bias detection model may be saved after a set number of epochs to improve inference performance. Although "lower score" and "higher score" are used to indicate more bias" and "less bias" respectively, one of ordinary skill in the art would appreciate that any suitable scoring system may be used to indicate more bias"
or "less bias". For example, in one or more embodiments, a lower score may indicate less bias while a higher score indicates more bias.
[0062]
In one or more embodiments, the bias detection model loads saved or trained parameters to perform inferencing.
During inferencing, preprocessing operations are performed on the test article or other content to convert the article into input vector format. The test article is then fed into the bias detection model to return a separate score for each sentence of the test article. In one or more embodiments, the bias detection model is applied multiple times to modified versions of the test article to identify the impact of each sentence in the test article on the bias score (i.e., the bias detection model is applied recursively). For example, in one or more embodiments, one or more sentences may be removed from the content and reintroduced between successive applications of the bias detection model.
Based on the change in the bias score when a sentence is removed compared to when the sentence was present, the importance of the sentence to the bias score may be determined.
[0063]
In one or more embodiments, the impact of every sentence is measured and, for example, the top fifth of total sentences that had the largest impact on (e.g., largest change in) the bias score may be identified by the server as biased sentences.
The biased sentences may be provided (e.g., displayed) to a user viewing the portion of the data feed including the article (or other content, such as a media broadcast) that inferencing is performed on. In one or more embodiments, the sentences having a bias score above a threshold value (e.g., a fixed objective threshold bias score, a bias score in a top percentile (such as the top 20%, the top 15%, or the top 10%) of all of the bias scores of the sentences in the article, video, or other content, or a bias score higher than the next highest bias score by a threshold amount) may be provided (e.g., displayed) to the user.
[0064]
In one or more embodiments, the claim detection method includes applying a claim detection model to a selected portion of a data feed 104 (e.g., an article, media broadcast, etc.). The claim detection model be trained based on preprocessing data and may be updated or retrained periodically.
[0065]
In one or more embodiments, data for preprocessing is extracted from the database 114 and/or an external database. The data includes a dataset having text that has been reviewed and labeled for reliability and/or fakeness. For example, a
or "less bias". For example, in one or more embodiments, a lower score may indicate less bias while a higher score indicates more bias.
[0062]
In one or more embodiments, the bias detection model loads saved or trained parameters to perform inferencing.
During inferencing, preprocessing operations are performed on the test article or other content to convert the article into input vector format. The test article is then fed into the bias detection model to return a separate score for each sentence of the test article. In one or more embodiments, the bias detection model is applied multiple times to modified versions of the test article to identify the impact of each sentence in the test article on the bias score (i.e., the bias detection model is applied recursively). For example, in one or more embodiments, one or more sentences may be removed from the content and reintroduced between successive applications of the bias detection model.
Based on the change in the bias score when a sentence is removed compared to when the sentence was present, the importance of the sentence to the bias score may be determined.
[0063]
In one or more embodiments, the impact of every sentence is measured and, for example, the top fifth of total sentences that had the largest impact on (e.g., largest change in) the bias score may be identified by the server as biased sentences.
The biased sentences may be provided (e.g., displayed) to a user viewing the portion of the data feed including the article (or other content, such as a media broadcast) that inferencing is performed on. In one or more embodiments, the sentences having a bias score above a threshold value (e.g., a fixed objective threshold bias score, a bias score in a top percentile (such as the top 20%, the top 15%, or the top 10%) of all of the bias scores of the sentences in the article, video, or other content, or a bias score higher than the next highest bias score by a threshold amount) may be provided (e.g., displayed) to the user.
[0064]
In one or more embodiments, the claim detection method includes applying a claim detection model to a selected portion of a data feed 104 (e.g., an article, media broadcast, etc.). The claim detection model be trained based on preprocessing data and may be updated or retrained periodically.
[0065]
In one or more embodiments, data for preprocessing is extracted from the database 114 and/or an external database. The data includes a dataset having text that has been reviewed and labeled for reliability and/or fakeness. For example, a
-13-1 dataset may include one or more articles (or reviewed text) where each article has been independently reviewed and labeled as having a determined degree of reliability and/or a degree of fakeness. A portion of or all of the data may be retrieved from a third party fake news corpus (e.g, a publicly available news dataset of articles, videos, or other content that have been reviewed and labeled as fake, reliable, and/or unreliable). In one or more embodiments, the data used for the bias detection method may be used to supplement data from a fake news corpus. Although the data used for bias detection may only include labels for objectivity and subjectivity, in one or more embodiments, given the data is from reliable data providers, and therefore, the data may be labeled as reliable for the purposes of the claim detection method.
[0066] In one or more embodiments, preprocessing for the claim detection method is similar to the preprocessing for the bias detection method. For example, the preprocessing for the claim detection method includes the same functions as the bias detection method in addition to some additional functions directed toward reliability and/or fakeness. Additional functions may include requesting articles or other media content from a fake news corpus having a specified label (e.g., fake, reliable, and/or unreliable) and loading data from the fake news corpus with the fake, reliable, and/or unreliable labels removed for training purposes.
[0067] In one or more embodiments, preprocessed data in a database is consulted or extracted to train the claim detection model. The preprocessed data may include any number of articles or other media content, for example, a single article or thousands of articles. The preprocessed data may be used as training data where the preprocessed data is fed into a main training loop. During the training loop, padding may be carried out on the training loop to make it uniform prior to feeding the padded data through a custom loader that creates batches on which training can be carried out. In the case of the claim detection model, padding is carried out in a different manner compared to the bias detection model because a different padding method provides more robust and better results for claim detection. For example, the claim detection model may use articles or other content from the fake news corpus which may be in a tensor format. Therefore, in one or more embodiments, the padding function uses tensor attributes to find out the shapes of the input tensors and perform calculations and padding. In contrast, the bias detection method may not use tensor attributes because the implementation may be a list of vectors in the dictionary.
Therefore, padding for the bias detection method may be performed in a different manner compared to the claim detection model.
[0068] In one or more embodiments, training for the claim detection model is similar to training for the bias detection model. For example, the training for claim detection model uses a linear attention class, a first CNN, a second CNN, and a main model
[0066] In one or more embodiments, preprocessing for the claim detection method is similar to the preprocessing for the bias detection method. For example, the preprocessing for the claim detection method includes the same functions as the bias detection method in addition to some additional functions directed toward reliability and/or fakeness. Additional functions may include requesting articles or other media content from a fake news corpus having a specified label (e.g., fake, reliable, and/or unreliable) and loading data from the fake news corpus with the fake, reliable, and/or unreliable labels removed for training purposes.
[0067] In one or more embodiments, preprocessed data in a database is consulted or extracted to train the claim detection model. The preprocessed data may include any number of articles or other media content, for example, a single article or thousands of articles. The preprocessed data may be used as training data where the preprocessed data is fed into a main training loop. During the training loop, padding may be carried out on the training loop to make it uniform prior to feeding the padded data through a custom loader that creates batches on which training can be carried out. In the case of the claim detection model, padding is carried out in a different manner compared to the bias detection model because a different padding method provides more robust and better results for claim detection. For example, the claim detection model may use articles or other content from the fake news corpus which may be in a tensor format. Therefore, in one or more embodiments, the padding function uses tensor attributes to find out the shapes of the input tensors and perform calculations and padding. In contrast, the bias detection method may not use tensor attributes because the implementation may be a list of vectors in the dictionary.
Therefore, padding for the bias detection method may be performed in a different manner compared to the claim detection model.
[0068] In one or more embodiments, training for the claim detection model is similar to training for the bias detection model. For example, the training for claim detection model uses a linear attention class, a first CNN, a second CNN, and a main model
-14-1 class CNN as described in the training for the bias detection model with suitable changes to accommodate fake, reliable, and/or unreliable labels. In one or more embodiments, the bias detection model uses objectivity and/or polarity in addition to other suitable features for improved results (e.g., a more robust or accurate model).
[0069] Accordingly, the training for the claim detection model may determine a claim score (e.g., a probability score). In one or more embodiments, a lower score indicates that an article or other media content is more "claimy" (e.g., the article requires further verification for accuracy and/or reliability) while a higher score indicates that an article is less "claimy". In one or more embodiments, the determined claim score is used to categorize the article and is compared to the claim score of a ground truth. The ground truth in this context may refer to claim scores determined based on previous claim detection models and databases comprising articles that have been reviewed and labeled for fakeness, reliability, and/or unreliability. The main model class CNN may use cross entropy loss and the claim detection model may be saved after a set number of epochs to improve inference performance. Although "lower score" and "higher score" are used to indicate more claimy" and "less claimy"
respectively, one of ordinary skill in the art would appreciate that any suitable scoring system may be used to indicate more claimy" or "less claimy". For example, in one or more embodiments, a lower score may indicate less "claimy" while a higher score indicates more "claimy".
[0070] In one or more embodiments, the claim detection model performs inferencing in the same manner described as the bias detection model.
Therefore, the impact of every sentence may be measured and, for example, the top fifth of total sentences that had the largest impact on (e.g., largest change in) the claim score may be identified by the server as "claimy" sentences. The "claimy" sentences may be provided (e.g., displayed) to a user viewing the portion of the data feed including the text that inferencing is performed on. In one or more embodiments, the sentences having a claim score above a threshold value (e.g., a fixed objective threshold claim score, a claim score in a top percentile (such as the top 20%, the top 15%, or the top 10%) of all of the claim scores of the sentences in the article or other content, or a claim score higher than the next highest claim score by a threshold amount) may be provided (e.g., displayed) to the user.
[0071] In one or more embodiments, the hate speech detection method includes applying a hate speech detection model to a selected portion of a data feed 104 (e.g., an article, media broadcast, and/or social media post). The hate speech detection model be trained based on preprocessing data and may be updated or retrained periodically.
[0069] Accordingly, the training for the claim detection model may determine a claim score (e.g., a probability score). In one or more embodiments, a lower score indicates that an article or other media content is more "claimy" (e.g., the article requires further verification for accuracy and/or reliability) while a higher score indicates that an article is less "claimy". In one or more embodiments, the determined claim score is used to categorize the article and is compared to the claim score of a ground truth. The ground truth in this context may refer to claim scores determined based on previous claim detection models and databases comprising articles that have been reviewed and labeled for fakeness, reliability, and/or unreliability. The main model class CNN may use cross entropy loss and the claim detection model may be saved after a set number of epochs to improve inference performance. Although "lower score" and "higher score" are used to indicate more claimy" and "less claimy"
respectively, one of ordinary skill in the art would appreciate that any suitable scoring system may be used to indicate more claimy" or "less claimy". For example, in one or more embodiments, a lower score may indicate less "claimy" while a higher score indicates more "claimy".
[0070] In one or more embodiments, the claim detection model performs inferencing in the same manner described as the bias detection model.
Therefore, the impact of every sentence may be measured and, for example, the top fifth of total sentences that had the largest impact on (e.g., largest change in) the claim score may be identified by the server as "claimy" sentences. The "claimy" sentences may be provided (e.g., displayed) to a user viewing the portion of the data feed including the text that inferencing is performed on. In one or more embodiments, the sentences having a claim score above a threshold value (e.g., a fixed objective threshold claim score, a claim score in a top percentile (such as the top 20%, the top 15%, or the top 10%) of all of the claim scores of the sentences in the article or other content, or a claim score higher than the next highest claim score by a threshold amount) may be provided (e.g., displayed) to the user.
[0071] In one or more embodiments, the hate speech detection method includes applying a hate speech detection model to a selected portion of a data feed 104 (e.g., an article, media broadcast, and/or social media post). The hate speech detection model be trained based on preprocessing data and may be updated or retrained periodically.
-15-
16 1 [0072] In one or more embodiments, data for preprocessing is extracted from the database 114 and/or an external database. The data includes a dataset having text that has been reviewed and labeled for hatefulness. For example, a dataset may include one or more articles and/or other media content (or reviewed text) where each article has been independently reviewed and labeled as having a determined degree of hatefulness based on racism, misogyny, homophobia, and/or other forms of discrimination. A portion of or all of the data may be retrieved from a third party fake news corpus (e.g., a publicly available news dataset of articles that have been reviewed and labeled as hateful). In one or more embodiments, the data used for the bias detection method may be used to supplement data from a fake news corpus.
Although the data used for bias detection may only include labels for objectivity and subjectivity, in one or more embodiments, the data is from reliable data providers, and therefore, the data may be labeled as not hateful for the purposes of the hate speech detection method.
[0073] In one or more embodiments, preprocessing for the hate speech detection method is similar to the preprocessing for the claim detection method. For example, the preprocessing for the hate speech detection method includes the same functions as the claim detection method, however, the functions may call indicators of or labels of hatefulness (e.g., racism, misogyny, homophobia, and/or other forms of discrimination) rather than reliability and/or fakeness. In one or more embodiments, the labels for hatefulness are also removed for training purposes.
[0074] In one or more embodiments, preprocessed data in the database is consulted or extracted to train the hate speech detection model. The preprocessed data may include any number of articles or other media content, for example, a single article or thousands of articles, videos, etc. The preprocessed data may be used as training data where the preprocessed data is fed into a main training loop.
During the training loop, padding may be carried out on the training loop to make it uniform prior to feeding the padded data through a custom loader that creates batches on which training can be carried out.
[0075] In one or more embodiments, training for the hate speech detection model is similar to training for the bias detection model. For example, the training for hate speech detection model uses a linear attention class, a first CNN, a second CNN, and a main model class CNN as described in the training for the bias detection model with suitable changes to accommodate hateful labels. In one or more embodiments, the hate speech detection model uses objectivity and/or polarity in addition to other suitable features for improved results (e.g., a more robust or accurate model).
[0076] Accordingly, the training for the hate speech detection model may determine a hate score (e.g., a probability score). In one or more embodiments, a lower score 1 indicates more hate while a higher score indicates less hate. In one or more embodiments, the determined hate score is compared to the hate score of a ground truth. The ground truth in this context may refer to hate scores determined based on previous hate speech detection models and databases comprising articles or other content that have been reviewed and labeled for indicators of hatefulness. The main model class CNN may use cross entropy loss and the bias detection model may be saved after a set number of epochs to improve inference performance. Although "lower score" and "higher score" are used to indicate more hate" and "less hate"
respectively, one of ordinary skill in the art would appreciate that any suitable scoring system may be used to indicate more hate" or "less hate". For example, in one or more embodiments, a lower score may indicate less hate while a higher score indicates more hate.
[0077] In one or more embodiments, the hate speech detection model performs inferencing in the same manner described as the bias detection model.
Therefore, the impact of every sentence may be measured and, for example, the top fifth of total sentences that had the largest impact on (e.g., largest change in) the hate score may be identified by the server as hateful sentences. The hateful sentences may be provided (e.g., displayed) to a user viewing the portion of the data feed including the article or other piece of content (e.g., media broadcast) that inferencing is performed on. In one or more embodiments, the sentences having a hate score above a threshold value (e.g., a fixed objective threshold hate score, a hate score in a top percentile (such as the top 20%, the top 15%, or the top 10%) of all of the hate scores of the sentences in the content's text, or a hate score higher than the next highest hate score by a threshold amount) may be provided (e.g., displayed) to the user.
[0078] In one or more embodiments, the bias detection method, the claim detection method, and the hate speech detection method may be concurrently (e.g., simultaneously) applied and the resulting sentences may be concurrently (e.g., simultaneously) provided to a user.
[0079] In one or more embodiments, neural network algorithms directed toward content summarization may be applied by the server 106 as directed by the software module 109. For example, the software module 109, in one or more embodiments, may manage a request from a user 110 or data providers 112 to summarize an article or other content. In response, the processor 118 of the server may execute instructions in the memory 120 corresponding to a summarizer to summarize the article.
[0080] In one or more embodiments, the summarizer is trained to capture journalistic style highlights of the content 116 (e.g., the article(s) 130, the video(s) 122, the text 124, the image(s) 126, and/or other content 128). In the case in which the
Although the data used for bias detection may only include labels for objectivity and subjectivity, in one or more embodiments, the data is from reliable data providers, and therefore, the data may be labeled as not hateful for the purposes of the hate speech detection method.
[0073] In one or more embodiments, preprocessing for the hate speech detection method is similar to the preprocessing for the claim detection method. For example, the preprocessing for the hate speech detection method includes the same functions as the claim detection method, however, the functions may call indicators of or labels of hatefulness (e.g., racism, misogyny, homophobia, and/or other forms of discrimination) rather than reliability and/or fakeness. In one or more embodiments, the labels for hatefulness are also removed for training purposes.
[0074] In one or more embodiments, preprocessed data in the database is consulted or extracted to train the hate speech detection model. The preprocessed data may include any number of articles or other media content, for example, a single article or thousands of articles, videos, etc. The preprocessed data may be used as training data where the preprocessed data is fed into a main training loop.
During the training loop, padding may be carried out on the training loop to make it uniform prior to feeding the padded data through a custom loader that creates batches on which training can be carried out.
[0075] In one or more embodiments, training for the hate speech detection model is similar to training for the bias detection model. For example, the training for hate speech detection model uses a linear attention class, a first CNN, a second CNN, and a main model class CNN as described in the training for the bias detection model with suitable changes to accommodate hateful labels. In one or more embodiments, the hate speech detection model uses objectivity and/or polarity in addition to other suitable features for improved results (e.g., a more robust or accurate model).
[0076] Accordingly, the training for the hate speech detection model may determine a hate score (e.g., a probability score). In one or more embodiments, a lower score 1 indicates more hate while a higher score indicates less hate. In one or more embodiments, the determined hate score is compared to the hate score of a ground truth. The ground truth in this context may refer to hate scores determined based on previous hate speech detection models and databases comprising articles or other content that have been reviewed and labeled for indicators of hatefulness. The main model class CNN may use cross entropy loss and the bias detection model may be saved after a set number of epochs to improve inference performance. Although "lower score" and "higher score" are used to indicate more hate" and "less hate"
respectively, one of ordinary skill in the art would appreciate that any suitable scoring system may be used to indicate more hate" or "less hate". For example, in one or more embodiments, a lower score may indicate less hate while a higher score indicates more hate.
[0077] In one or more embodiments, the hate speech detection model performs inferencing in the same manner described as the bias detection model.
Therefore, the impact of every sentence may be measured and, for example, the top fifth of total sentences that had the largest impact on (e.g., largest change in) the hate score may be identified by the server as hateful sentences. The hateful sentences may be provided (e.g., displayed) to a user viewing the portion of the data feed including the article or other piece of content (e.g., media broadcast) that inferencing is performed on. In one or more embodiments, the sentences having a hate score above a threshold value (e.g., a fixed objective threshold hate score, a hate score in a top percentile (such as the top 20%, the top 15%, or the top 10%) of all of the hate scores of the sentences in the content's text, or a hate score higher than the next highest hate score by a threshold amount) may be provided (e.g., displayed) to the user.
[0078] In one or more embodiments, the bias detection method, the claim detection method, and the hate speech detection method may be concurrently (e.g., simultaneously) applied and the resulting sentences may be concurrently (e.g., simultaneously) provided to a user.
[0079] In one or more embodiments, neural network algorithms directed toward content summarization may be applied by the server 106 as directed by the software module 109. For example, the software module 109, in one or more embodiments, may manage a request from a user 110 or data providers 112 to summarize an article or other content. In response, the processor 118 of the server may execute instructions in the memory 120 corresponding to a summarizer to summarize the article.
[0080] In one or more embodiments, the summarizer is trained to capture journalistic style highlights of the content 116 (e.g., the article(s) 130, the video(s) 122, the text 124, the image(s) 126, and/or other content 128). In the case in which the
-17-1 content 116 includes one or more articles 130, the algorithm learns to identify important sentences in varying contexts by paying attention to not only the global interpretation of a sentence but how the sentence interacts with other sentences within the article 130.
[0081] In one or more embodiments, the summarizer method includes applying a summarizer model to a selected portion of a data feed 104 (e.g., an article, a media broadcast, and/or a 3rd party platform's media content). The summarizer model may be trained based on preprocessing data and may be updated or retrained periodically.
[0082] In one or more embodiments, data for preprocessing is extracted from the database 114 and/or an external database. The data may include a dataset specifically for summarization tasks (e.g., any suitable news datasets built for summarization or gisting purposes that can be used for training). For example, the dataset may include an article and a summary (e.g., a collection of highlights) as the ground truth.
[0083] In one or more embodiments, preprocessing includes loading a dataset including one or more news articles and/or other media content. Each article of the one or more news articles may be split into a story, highlights, and/or a title (or headline). Preprocessing further includes using an encoder to provide vector representations of each article. The encoder may be based on any suitable language representation model to find vector representations/embeddings of the words in articles of the dataset, for example, a language representation model is described in an article by Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova titled "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
(2018) (last revised May 24, 2019) available at https://arxiv.org/pdf/1810.04805.pdf.
In one or more embodiments. the encoder may stack the source text, the vector representation of the text, and the label vectors. The stack formed from preprocessed data may be used to create a dictionary for the various features of each article (or reviewed text), and the dictionary may be appended to the preprocessed data in a database (e.g., database 114 and/or an external database) for future access.
In this manner, the dictionary may be quickly consulted without having to preprocess the dataset again.
[0084] In one or more embodiments, training capitalizes on transfer learning between a long short term memory ("LSTM") network and a transformer network.
The transformer network starts learning by using the parameters from the LSTM
network and then continues on in the normal fashion. By using both the LSTM network and the transformer network, better results are provided than using only one network.
[0085] In one or more embodiments, the LSTM training method includes loading the preprocessed data (e.g., files stored by a preprocessor in a previous phase). The
[0081] In one or more embodiments, the summarizer method includes applying a summarizer model to a selected portion of a data feed 104 (e.g., an article, a media broadcast, and/or a 3rd party platform's media content). The summarizer model may be trained based on preprocessing data and may be updated or retrained periodically.
[0082] In one or more embodiments, data for preprocessing is extracted from the database 114 and/or an external database. The data may include a dataset specifically for summarization tasks (e.g., any suitable news datasets built for summarization or gisting purposes that can be used for training). For example, the dataset may include an article and a summary (e.g., a collection of highlights) as the ground truth.
[0083] In one or more embodiments, preprocessing includes loading a dataset including one or more news articles and/or other media content. Each article of the one or more news articles may be split into a story, highlights, and/or a title (or headline). Preprocessing further includes using an encoder to provide vector representations of each article. The encoder may be based on any suitable language representation model to find vector representations/embeddings of the words in articles of the dataset, for example, a language representation model is described in an article by Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova titled "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
(2018) (last revised May 24, 2019) available at https://arxiv.org/pdf/1810.04805.pdf.
In one or more embodiments. the encoder may stack the source text, the vector representation of the text, and the label vectors. The stack formed from preprocessed data may be used to create a dictionary for the various features of each article (or reviewed text), and the dictionary may be appended to the preprocessed data in a database (e.g., database 114 and/or an external database) for future access.
In this manner, the dictionary may be quickly consulted without having to preprocess the dataset again.
[0084] In one or more embodiments, training capitalizes on transfer learning between a long short term memory ("LSTM") network and a transformer network.
The transformer network starts learning by using the parameters from the LSTM
network and then continues on in the normal fashion. By using both the LSTM network and the transformer network, better results are provided than using only one network.
[0085] In one or more embodiments, the LSTM training method includes loading the preprocessed data (e.g., files stored by a preprocessor in a previous phase). The
-18-1 LSTM training method includes creating a custom data loader for loading the actual records and making batches out of them to supply to a main training loop. To do this the LSTM training method includes uniformly padding the input to be of constant size.
Both the vector representations of text and labels may be padded.
[0086] In one or more embodiments, the main LSTM network is defined in the LSTM class. The final linear layer of the LSTM network may convert the data to linear dimensions and a softmax may be carried out. The logits may then be reshaped again and returned. For the loss paddings may be disregarded and classification labels may be compared using cross entropy loss which may be tuned for giving a much higher penalty when the model predicts the wrong output. The LSTM training method further includes calculating the accuracy according to different relevant factors using any suitable metric known to one of ordinary skill in the art. The training loop uses this network model and updates parameters accordingly. The training loop may then save the trained parameters so that they can be used by the transformer network.
[0087] In one or more embodiments, the transformer training method includes using two LSTM layers at the start of the network. The two LSTM layers may be the same layers as in the LSTM network and may be used to load the pre-trained parameters from the LSTM network. The layers may then pass on the logits to the main transformer network.
[0088] In one or more embodiments, the transformer training method includes a multiheaded attention class, a norm class, and a feedforward layer class. The multiheaded attention class is configured to calculate the multi headed attention values. Multiheaded attention allows the attention module to focus on different positions as well as gives the attention layer multiple "representation subspaces". The norm class is configured to calculate the mathematically stable norm values used in transformer calculations. The feedforward layer class is configured to define feedforward layers for the transformer to use in the network.
[0089] In one or more embodiments, the transformer training method includes a transformer class. The transformer class is configured to use the multiheaded attention class, the norm class, and the feedforward layer class to create a full transformer model. The transformer class goes through a series of norm operations into the multi headed attention layer and then into the feedforward layers.
[0090] In one or more embodiments, the transformer training method includes a transformer class. The transformer class is the main class which is configured to encapsulates all of the above modules into a functional pipeline. The transformer class includes a custom initialization function to load pre-trained parameters from the LSTM
network if transfer learning is set to True. The transformer class initializes the transformer according to the arguments (e.g., number of transformers and number of
Both the vector representations of text and labels may be padded.
[0086] In one or more embodiments, the main LSTM network is defined in the LSTM class. The final linear layer of the LSTM network may convert the data to linear dimensions and a softmax may be carried out. The logits may then be reshaped again and returned. For the loss paddings may be disregarded and classification labels may be compared using cross entropy loss which may be tuned for giving a much higher penalty when the model predicts the wrong output. The LSTM training method further includes calculating the accuracy according to different relevant factors using any suitable metric known to one of ordinary skill in the art. The training loop uses this network model and updates parameters accordingly. The training loop may then save the trained parameters so that they can be used by the transformer network.
[0087] In one or more embodiments, the transformer training method includes using two LSTM layers at the start of the network. The two LSTM layers may be the same layers as in the LSTM network and may be used to load the pre-trained parameters from the LSTM network. The layers may then pass on the logits to the main transformer network.
[0088] In one or more embodiments, the transformer training method includes a multiheaded attention class, a norm class, and a feedforward layer class. The multiheaded attention class is configured to calculate the multi headed attention values. Multiheaded attention allows the attention module to focus on different positions as well as gives the attention layer multiple "representation subspaces". The norm class is configured to calculate the mathematically stable norm values used in transformer calculations. The feedforward layer class is configured to define feedforward layers for the transformer to use in the network.
[0089] In one or more embodiments, the transformer training method includes a transformer class. The transformer class is configured to use the multiheaded attention class, the norm class, and the feedforward layer class to create a full transformer model. The transformer class goes through a series of norm operations into the multi headed attention layer and then into the feedforward layers.
[0090] In one or more embodiments, the transformer training method includes a transformer class. The transformer class is the main class which is configured to encapsulates all of the above modules into a functional pipeline. The transformer class includes a custom initialization function to load pre-trained parameters from the LSTM
network if transfer learning is set to True. The transformer class initializes the transformer according to the arguments (e.g., number of transformers and number of
-19-1 attention heads). In one or more embodiments, the arguments can be customized as the dataset evolves and new hyperparameters are needed.
[0091] In one or more embodiments, the loss for the transformer training method is calculated in a manner similar to the LSTM training method. The transformer training method computers actual predictions from logits to evaluate the loss. In one or more embodiments, cross entropy is tuned or retuned to a different value to provide the required level of penalty. The training loop incorporates the above network and evaluates the loss and updates the parameters. The parameters are then saved for inferencing.
[0092] In one or more embodiments, the summarizer model loads the pretrained weights saved from training. The summarizer model then takes the test content's text (e.g., articles, videos, etc.), passes it through preprocessing (e.g., without labels), creates a batch and passes the encoded content into the network set to eval mode.
Using the softmax logits the summarizer model computes a summary score for each sentence. This summary score refers to the probability of the sentence being in the summary.
[0093] In one or more embodiments, the sentences are sorted according to their summary score in the decreasing order and the top sentences are chosen for the summary according to various requirements of the current summarization task (e.g., minimum number of sentences, max number of sentences, length range of sentences etc.). The number of top sentences chosen may be adjusted from one sentence to the max number of sentences in an article as desired.
[0094] The top sentences may be provided (e.g., displayed) to a user viewing the portion of the data feed including the article and/or other content (e.g., a media broadcast) that inferencing is performed on. In one or more embodiments, the sentences having a summary score above a threshold value (e.g., a fixed objective threshold summary score, a summary score in a top percentile (such as the top
[0091] In one or more embodiments, the loss for the transformer training method is calculated in a manner similar to the LSTM training method. The transformer training method computers actual predictions from logits to evaluate the loss. In one or more embodiments, cross entropy is tuned or retuned to a different value to provide the required level of penalty. The training loop incorporates the above network and evaluates the loss and updates the parameters. The parameters are then saved for inferencing.
[0092] In one or more embodiments, the summarizer model loads the pretrained weights saved from training. The summarizer model then takes the test content's text (e.g., articles, videos, etc.), passes it through preprocessing (e.g., without labels), creates a batch and passes the encoded content into the network set to eval mode.
Using the softmax logits the summarizer model computes a summary score for each sentence. This summary score refers to the probability of the sentence being in the summary.
[0093] In one or more embodiments, the sentences are sorted according to their summary score in the decreasing order and the top sentences are chosen for the summary according to various requirements of the current summarization task (e.g., minimum number of sentences, max number of sentences, length range of sentences etc.). The number of top sentences chosen may be adjusted from one sentence to the max number of sentences in an article as desired.
[0094] The top sentences may be provided (e.g., displayed) to a user viewing the portion of the data feed including the article and/or other content (e.g., a media broadcast) that inferencing is performed on. In one or more embodiments, the sentences having a summary score above a threshold value (e.g., a fixed objective threshold summary score, a summary score in a top percentile (such as the top
20%, the top 15%, or the top 10%) of all of the summary scores of the sentences in the article, or a summary score higher than the next highest summary score by a threshold amount) may be provided (e.g., displayed) to the user.
[0095] Because the summarizer model breaks articles into words and because news articles and other content may be written in various styles, the summarizer model sometimes loses the punctuation and syntactic information when outputting a rough summary. In one or more embodiments, this issue is addressed by a separate post processing pipeline which uses commands from a software library to clean and properly punctuate the text. Accordingly, the output from the summarizer model is passed through the post processing pipeline and a neat summary is generated.
In one or more embodiments, the post processing pipeline can be used for several text 1 cleaning tasks to provide neat text to users beyond just the output of the summarizer model.
[0096] FIG. 6 is a screenshot of a platform displayed on the electronic device 108 of the user 110 by the software module 109. The platform includes a data feed showing various news articles and other content (e.g., media broadcasts, video, etc.) from data providers 112. The articles may be posted by the data providers 112 or otherwise obtained from the data providers (e.g., pulled from the data providers 112).
In one or more embodiments, the software module 109 is configured to enable a user 110 to "subscribe" to one or more particular data providers 112. For example, in one or more embodiments, the software module 109 may be configured to display an icon on the data feed 104 of the electronic device 108 of the user 110, which, when selected by the user, subscribes the user to the associated data provider 112, as shown in FIG.
7. The data provider 112 may be a news organization, a vetted user or platform, or a business among other things, such as the BBC, PBS, CNN, or AP News. In one or more embodiments, when a user 110 is subscribed to a data provider 112, the data provider 112 is able to send push notifications to the user 110 (i.e., the data provider 112 is able to send push notifications to subscribed users 110). These push notifications may include notifications of content recently posted by the data provider 112 or pulled from the data provider 112. In this manner, the software module enables the data providers 112 to send targeted notifications to those users 110 who are subscribed to the data provider 112.
[0097] In one or more embodiments, the software module 109 is configured to display, on the data feed 104 of the electronic device 108 of the user 110, a page that is specific to the data provider 112 (e.g., a business page or journalist profile). The specific page for the data provider 112 may display press releases and/or sponsored articles containing content related to the data provider 112. In one or more embodiments, the data provider 112 (e.g., a business or journalist) may link to its website or other information pertinent to its business on its specific page (e.g., email, cell phone, etc.). In one or more embodiments, the specific page for the data provider 112 may be limited to display only articles or other text-based content (e.g., the specific webpage for the data provider 112 may or may not include purely visual content, such as images, graphics, or photographs). In one or more embodiments, the specific page of the data provider 112 may include articles containing text and audiovisual content (e.g., image(s) and/or video(s)). Additionally, in one or more embodiments, content (e.g., sponsored articles or press releases, among other data) published on the specific page for the data provider 112 may be sent, by the data provider 112, to news organizations, persons of interest, or specific journalists via a URL or hyperlink, an email, outside social media profiles, or a platform profile (if a journalist has created
[0095] Because the summarizer model breaks articles into words and because news articles and other content may be written in various styles, the summarizer model sometimes loses the punctuation and syntactic information when outputting a rough summary. In one or more embodiments, this issue is addressed by a separate post processing pipeline which uses commands from a software library to clean and properly punctuate the text. Accordingly, the output from the summarizer model is passed through the post processing pipeline and a neat summary is generated.
In one or more embodiments, the post processing pipeline can be used for several text 1 cleaning tasks to provide neat text to users beyond just the output of the summarizer model.
[0096] FIG. 6 is a screenshot of a platform displayed on the electronic device 108 of the user 110 by the software module 109. The platform includes a data feed showing various news articles and other content (e.g., media broadcasts, video, etc.) from data providers 112. The articles may be posted by the data providers 112 or otherwise obtained from the data providers (e.g., pulled from the data providers 112).
In one or more embodiments, the software module 109 is configured to enable a user 110 to "subscribe" to one or more particular data providers 112. For example, in one or more embodiments, the software module 109 may be configured to display an icon on the data feed 104 of the electronic device 108 of the user 110, which, when selected by the user, subscribes the user to the associated data provider 112, as shown in FIG.
7. The data provider 112 may be a news organization, a vetted user or platform, or a business among other things, such as the BBC, PBS, CNN, or AP News. In one or more embodiments, when a user 110 is subscribed to a data provider 112, the data provider 112 is able to send push notifications to the user 110 (i.e., the data provider 112 is able to send push notifications to subscribed users 110). These push notifications may include notifications of content recently posted by the data provider 112 or pulled from the data provider 112. In this manner, the software module enables the data providers 112 to send targeted notifications to those users 110 who are subscribed to the data provider 112.
[0097] In one or more embodiments, the software module 109 is configured to display, on the data feed 104 of the electronic device 108 of the user 110, a page that is specific to the data provider 112 (e.g., a business page or journalist profile). The specific page for the data provider 112 may display press releases and/or sponsored articles containing content related to the data provider 112. In one or more embodiments, the data provider 112 (e.g., a business or journalist) may link to its website or other information pertinent to its business on its specific page (e.g., email, cell phone, etc.). In one or more embodiments, the specific page for the data provider 112 may be limited to display only articles or other text-based content (e.g., the specific webpage for the data provider 112 may or may not include purely visual content, such as images, graphics, or photographs). In one or more embodiments, the specific page of the data provider 112 may include articles containing text and audiovisual content (e.g., image(s) and/or video(s)). Additionally, in one or more embodiments, content (e.g., sponsored articles or press releases, among other data) published on the specific page for the data provider 112 may be sent, by the data provider 112, to news organizations, persons of interest, or specific journalists via a URL or hyperlink, an email, outside social media profiles, or a platform profile (if a journalist has created
-21-1 one), among other ways, to generate press or other reputational gain or insight for the data provider 112. Additional data pertinent to media relations groups, journalists, topic experts, university professionals or others may be sent via "push"
notifications.
Furthermore, in one or more embodiments, the content (e.g., "sponsored"
articles, contact information, press releases, etc.) posted on the specific page for the data provider 112 may be sent, by the data provider 112 or other users, to other social platforms, such as Facebook, Twitter, Google, or LinkedIn. The content may be posted to the other social platforms in real time or may be scheduled for publication at a future time on the same day or at a future date and time. Furthermore, the software module 109 may be configured to enable the data provider 112 (e.g., the business) to send push notifications regarding the content (e.g., articles or press releases) published on the data provider's specific page to its subscribers. In this manner, the page that is specific to the data provider 112 makes it easy for customers or potential customers to learn about the data provider's 112 business and the goods and/or services offered by the data provider 112. Furthermore, in one or more embodiments, the specific page for the data provider 112 (e.g., the business page) may include RSS (i.e., Really Simple Syndication or RDF Site Summary) to enable the specific page to be fetched by an RSS feed reader and, for example, displayed with information from other sites in a news aggregator. In one or more embodiments, the specific page for the data provider 112 may also enable the data provider 112 to make the page private or public to different lists of subscribers or sets of users who are granted administrative access.
[0098] In one or more embodiments, the software module 109 may be configured to display an advertisement portal in which an advertisement buyer (e.g., a user or business) can create and promote a sponsored article and/or a press release to other users on the platform. In one or more embodiments, advertisement buyers may be able to create and purchase an ad and target it to the general platform data feed 104, to users who subscribe to specific data providers (e.g., newspapers, journalists, publishers, etc.), and/or to users matching specified demographic, geographic, psychographic, and/or behavioral segmentations.
[0099] In one or more embodiments, the software module 109 is configured to enable data (e.g., an article, a "sponsored" article, a media broadcast, or a press release) displayed on the data feed 104 (e.g., a user's data feed or a data provider's specific page) to be sent, by the user 110, the data provider 112 (e.g., a publisher), or a business, to other platforms, such as Facebook, Twitter, or LinkedIn. The content may be posted to the other platforms in real time or may be scheduled for publication at a future time on the same day or at a future date and time.
[00100] Additionally, in one or more embodiments, the software module 109 is configured to generate a URL for tracking and displaying the number of times specific
notifications.
Furthermore, in one or more embodiments, the content (e.g., "sponsored"
articles, contact information, press releases, etc.) posted on the specific page for the data provider 112 may be sent, by the data provider 112 or other users, to other social platforms, such as Facebook, Twitter, Google, or LinkedIn. The content may be posted to the other social platforms in real time or may be scheduled for publication at a future time on the same day or at a future date and time. Furthermore, the software module 109 may be configured to enable the data provider 112 (e.g., the business) to send push notifications regarding the content (e.g., articles or press releases) published on the data provider's specific page to its subscribers. In this manner, the page that is specific to the data provider 112 makes it easy for customers or potential customers to learn about the data provider's 112 business and the goods and/or services offered by the data provider 112. Furthermore, in one or more embodiments, the specific page for the data provider 112 (e.g., the business page) may include RSS (i.e., Really Simple Syndication or RDF Site Summary) to enable the specific page to be fetched by an RSS feed reader and, for example, displayed with information from other sites in a news aggregator. In one or more embodiments, the specific page for the data provider 112 may also enable the data provider 112 to make the page private or public to different lists of subscribers or sets of users who are granted administrative access.
[0098] In one or more embodiments, the software module 109 may be configured to display an advertisement portal in which an advertisement buyer (e.g., a user or business) can create and promote a sponsored article and/or a press release to other users on the platform. In one or more embodiments, advertisement buyers may be able to create and purchase an ad and target it to the general platform data feed 104, to users who subscribe to specific data providers (e.g., newspapers, journalists, publishers, etc.), and/or to users matching specified demographic, geographic, psychographic, and/or behavioral segmentations.
[0099] In one or more embodiments, the software module 109 is configured to enable data (e.g., an article, a "sponsored" article, a media broadcast, or a press release) displayed on the data feed 104 (e.g., a user's data feed or a data provider's specific page) to be sent, by the user 110, the data provider 112 (e.g., a publisher), or a business, to other platforms, such as Facebook, Twitter, or LinkedIn. The content may be posted to the other platforms in real time or may be scheduled for publication at a future time on the same day or at a future date and time.
[00100] Additionally, in one or more embodiments, the software module 109 is configured to generate a URL for tracking and displaying the number of times specific
-22-1 content and user data (e.g., an article, contact information, a press release, trends, etc.) was posted on its system and to other social platforms, and displaying the profiles (e.g., names) of the businesses or individuals who received, viewed, or shared the content. In this manner, businesses may enter the URL into an electronic device 108 and determine how many times an article among other content was received, viewed, or shared and which key people of interest were involved within the content, received, viewed, or shared it. In one or more embodiments, the software module 109 is configured to generate and display an analytics dashboard to display the effectiveness and metrics of a data provider's URLs, subscriber growth, and of purchased advertisements, among other things.
[00101] In one or more embodiments, the software module 109 is configured to generate audio, utilizing text to speech artificial intelligence, based on the content (e.g., news articles, videos, press releases, etc.) of the data provider 112 displayed on the user's data feed 104 (e.g., the business page of the data provider, or the data feed displaying data from multiple data providers). For instance, in one or more embodiments, the software module 109 may be configured to display an icon on the data feed 104 of the electronic device 108 of the user 110, which, when selected by the user, generates audio from a speaker of the user's electronic device 108 based on the text contained in the associated article or piece of content. The audio may contain several different "voices" and algorithms may be used to shuffle the voices or to create "playlists" of news articles and/or other media related content. Additionally, in one or more embodiments, the software module 109 is configured to display audiovisual pages that indicate to the user 110 how long they have been listening, the content's 'cover, and/or how long it will take for them to complete listening to an article (i.e., the time that has elapsed and/or the time remaining to completion).
[00102] FIGS. 8A-8G depict screenshots of a mobile application incorporating the algorithms and other functionality of the present disclosure.
[00103] In one or more embodiments, the algorithms, features, and/or methodologies described herein may be utilized by a content publisher (e.g., a newspaper) to generate digital recurring revenue (e.g., increased digital subscriptions and advertisements).
[00104] The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to limit the example embodiments described herein.
[00105] As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
[00106] It will be further understood that the terms "includes," "including,"
"comprises," and/or "comprising," when used in this specification, specify the presence
[00101] In one or more embodiments, the software module 109 is configured to generate audio, utilizing text to speech artificial intelligence, based on the content (e.g., news articles, videos, press releases, etc.) of the data provider 112 displayed on the user's data feed 104 (e.g., the business page of the data provider, or the data feed displaying data from multiple data providers). For instance, in one or more embodiments, the software module 109 may be configured to display an icon on the data feed 104 of the electronic device 108 of the user 110, which, when selected by the user, generates audio from a speaker of the user's electronic device 108 based on the text contained in the associated article or piece of content. The audio may contain several different "voices" and algorithms may be used to shuffle the voices or to create "playlists" of news articles and/or other media related content. Additionally, in one or more embodiments, the software module 109 is configured to display audiovisual pages that indicate to the user 110 how long they have been listening, the content's 'cover, and/or how long it will take for them to complete listening to an article (i.e., the time that has elapsed and/or the time remaining to completion).
[00102] FIGS. 8A-8G depict screenshots of a mobile application incorporating the algorithms and other functionality of the present disclosure.
[00103] In one or more embodiments, the algorithms, features, and/or methodologies described herein may be utilized by a content publisher (e.g., a newspaper) to generate digital recurring revenue (e.g., increased digital subscriptions and advertisements).
[00104] The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to limit the example embodiments described herein.
[00105] As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
[00106] It will be further understood that the terms "includes," "including,"
"comprises," and/or "comprising," when used in this specification, specify the presence
-23-1 of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.
[00107] As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
[00108] Further, the use of "may" when describing embodiments of the present disclosure refers to one or more embodiments of the present disclosure".
[00109] Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present disclosure belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and/or the present specification, and should not be interpreted in an idealized or overly formal sense, unless expressly so defined herein.
[00110] While this invention has been described in detail with particular references to exemplary embodiments thereof, the exemplary embodiments described herein are not intended to be exhaustive or to limit the scope of the invention to the exact forms disclosed. Persons skilled in the art and technology to which this invention pertains will appreciate that alterations and changes in the described structures and methods of assembly and operation can be practiced without meaningfully departing from the principles, spirit, and scope of this invention, and equivalents thereof.
[00107] As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
[00108] Further, the use of "may" when describing embodiments of the present disclosure refers to one or more embodiments of the present disclosure".
[00109] Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present disclosure belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and/or the present specification, and should not be interpreted in an idealized or overly formal sense, unless expressly so defined herein.
[00110] While this invention has been described in detail with particular references to exemplary embodiments thereof, the exemplary embodiments described herein are not intended to be exhaustive or to limit the scope of the invention to the exact forms disclosed. Persons skilled in the art and technology to which this invention pertains will appreciate that alterations and changes in the described structures and methods of assembly and operation can be practiced without meaningfully departing from the principles, spirit, and scope of this invention, and equivalents thereof.
-24-
Claims (26)
1. A system for addressing disinformation in news, the system comprising:
a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
display, on a display of an electronic device, a content feed of a social news network comprising a plurality of news items received from one or more news organizations; and restrict the electronic device from publishing news items to the content feed without prior authorization.
a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
display, on a display of an electronic device, a content feed of a social news network comprising a plurality of news items received from one or more news organizations; and restrict the electronic device from publishing news items to the content feed without prior authorization.
2. The system of claim 1, wherein the plurality of news items comprises content selected from the group consisting of a news article, a news video, a news image, and news audio.
3. The system of claim 1 or claim 2, wherein the instructions, when executed by the processor, further cause the processor to display, on the display of the electronic device, the news items received only from vetted news publishers.
4. The system of claim 1 or claim 2, wherein the instructions, when executed by the processor, further cause the processor to send a notification, to the electronic device, in response to a determination that a news item of the plurality of news items that was displayed on the display of the electronic device contains disinformation.
5. The system of claim 1 or claim 2, wherein the instructions, when executed by the processor, further cause the processor to display a notification, on the display of the electronic device, in response to the electronic device being associated with a subscriber of a publisher and a new news item being published by the publisher.
6. The system of claim 1 or claim 2, wherein the instructions, when executed by the processor, further cause the processor to display a comment, a "like,"
a share, or a play associated with a news item of the plurality of news items received from the electronic device.
a share, or a play associated with a news item of the plurality of news items received from the electronic device.
7. The system of claim 1 or claim 2, wherein the software instructions, when executed by the processor, further cause the processor to display, on the display of the electronic device, a cover of one of the plurality of news items.
8. The system of claim 1 or claim 2, wherein the software instructions, when executed by the processor, further cause the processor to display, on the display of the electronic device, an advertisement portal for vetted publishers, businesses, and consumers.
9. The system of claim 1 or claim 2, wherein the instructions, when executed by the processor, further cause the processor to publish a news items from a vetted news organization on a user's story, transmit the news item to another social media site, or to share the news item with the user's friends.
10. The system of claim 1 or claim 2, wherein the instructions, when executed by the processor, further cause the processor to moderate the content feed in response to a notification from a publisher that one or more of the plurality of news items on the content feed contains disinformation.
11. A system for addressing disinformation in news, the system comprising:
a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
determine, utilizing a bias detection model, a bias score for each component of a news item from a news organization; and display, on a display of an electronic device, components of the news item having a bias score lower than a threshold value.
a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
determine, utilizing a bias detection model, a bias score for each component of a news item from a news organization; and display, on a display of an electronic device, components of the news item having a bias score lower than a threshold value.
12. The system of claim 11, wherein the news item comprises a news article, and the components are a plurality of sentences of the news article.
13. The system of claim 12, wherein the instructions, when executed by the processor, cause the processor to:
split the news article into the plurality of sentences; and determine an emotion associated with each sentence of the plurality of sentences.
split the news article into the plurality of sentences; and determine an emotion associated with each sentence of the plurality of sentences.
14. The system of claim 13, wherein the instructions, when executed by the processor, cause the processor to determine a polarity in a range from -1 to 1 to determine the emotion associated with each sentence of the plurality of sentences, and wherein 1 is a positive statement and -1 is a negative statement.
15. The system of claim 13, wherein the instructions, when executed by the processor, cause the processor to determine a subjectivity score in a range from 0 to 1 to determine the emotion associated with each sentence of the plurality of sentences, and wherein 0 is a factual statement and 1 is a personal opinion.
16. The system of any one of claims 12-15, wherein the instructions, when executed by the processor, further cause the processor to:
recursively apply the bias detection model to the article including a first application of the bias detection model and a second application of the bias detection model;
remove at least one sentence of the article before the first application of the bias detection model;
reintroduce the at least one sentence to the article after the first application of the bias detection model and before the second application of the bias detection model; and determine a change in the bias score between the first application of the bias detection model and the second application of the bias detection model.
recursively apply the bias detection model to the article including a first application of the bias detection model and a second application of the bias detection model;
remove at least one sentence of the article before the first application of the bias detection model;
reintroduce the at least one sentence to the article after the first application of the bias detection model and before the second application of the bias detection model; and determine a change in the bias score between the first application of the bias detection model and the second application of the bias detection model.
17. A system for addressing disinformation in news, the system comprising:
a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
determine, utilizing a claim detection model, a claim score for each component of a news item from a news organization; and output components of the news item having a claim score lower than a threshold value.
a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
determine, utilizing a claim detection model, a claim score for each component of a news item from a news organization; and output components of the news item having a claim score lower than a threshold value.
18. The system of claim 17, wherein the news item comprises a news article, and the components comprise a plurality of sentences of the news article.
19. The system of claim 18, wherein the instructions, when executed by the processor, cause the processor to request articles from a fake news corpus to determine the claim score.
20. A system for addressing disinformation in news, the system comprising:
a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
determine, utilizing a hate speech detection model, a hate score for each component of a news item from a news organization; and output components of the news item having a hate score lower than a threshold value.
a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
determine, utilizing a hate speech detection model, a hate score for each component of a news item from a news organization; and output components of the news item having a hate score lower than a threshold value.
21. The system of claim 20, wherein the news item comprises a news article, and the components comprise a plurality of sentences of the news article.
22. The system of claim 20, wherein the instructions, when executed by the processor, cause the processor to request articles from a hate news corpus to determine the hate score.
23. A system for addressing disinformation in news, the system comprising:
a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
determine, utilizing a summarizer model, a summary score for each component of a news item from a news organization; and output components of the news item having a summary score lower than a threshold value.
a non-transitory computer-readable storage medium having software instructions stored therein, which, when executed by a processor, cause the processor to:
determine, utilizing a summarizer model, a summary score for each component of a news item from a news organization; and output components of the news item having a summary score lower than a threshold value.
24. The system of claim 23, wherein the news item comprises a news article, and the components comprise a plurality of sentences of the news article.
25. The system of claim 24, wherein the instructions, when executed by the processor, cause the processor to punctuate, with a post processing pipeline, the plurality sentences.
26. The system of claim 11, 17, 20, or 23, wherein the news item is on a website or a mobile application.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202063057171P | 2020-07-27 | 2020-07-27 | |
| US63/057,171 | 2020-07-27 | ||
| PCT/US2021/043229 WO2022026416A1 (en) | 2020-07-27 | 2021-07-26 | System and method for addressing disinformation |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CA3190303A1 true CA3190303A1 (en) | 2022-02-03 |
Family
ID=80036117
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CA3190303A Pending CA3190303A1 (en) | 2020-07-27 | 2021-07-26 | System and method for addressing disinformation |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP4189553A4 (en) |
| CA (1) | CA3190303A1 (en) |
| MX (1) | MX2023001215A (en) |
| WO (1) | WO2022026416A1 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240070402A1 (en) * | 2022-08-31 | 2024-02-29 | Viettel Group | Method for factual event detection from online news based on deep learning |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7596606B2 (en) * | 1999-03-11 | 2009-09-29 | Codignotto John D | Message publishing system for publishing messages from identified, authorized senders |
| EP2756425B1 (en) * | 2011-10-14 | 2020-11-11 | Oath Inc. | Method and apparatus for automatically summarizing the contents of electronic documents |
| US10747837B2 (en) * | 2013-03-11 | 2020-08-18 | Creopoint, Inc. | Containing disinformation spread using customizable intelligence channels |
| US9972055B2 (en) * | 2014-02-28 | 2018-05-15 | Lucas J. Myslinski | Fact checking method and system utilizing social networking information |
| US12141878B2 (en) * | 2018-09-21 | 2024-11-12 | Kai SHU | Method and apparatus for collecting, detecting and visualizing fake news |
-
2021
- 2021-07-26 CA CA3190303A patent/CA3190303A1/en active Pending
- 2021-07-26 MX MX2023001215A patent/MX2023001215A/en unknown
- 2021-07-26 EP EP21849293.2A patent/EP4189553A4/en not_active Withdrawn
- 2021-07-26 WO PCT/US2021/043229 patent/WO2022026416A1/en not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| MX2023001215A (en) | 2023-05-03 |
| EP4189553A4 (en) | 2024-10-09 |
| WO2022026416A1 (en) | 2022-02-03 |
| EP4189553A1 (en) | 2023-06-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Raza et al. | News recommender system: a review of recent progress, challenges, and opportunities | |
| US11488070B2 (en) | Iterative classifier training on online social networks | |
| US20230134118A1 (en) | Decentralized social news network website application (dapplication) on a blockchain including a newsfeed, nft marketplace, and a content moderation process for vetted content providers | |
| US9892109B2 (en) | Automatically coding fact check results in a web page | |
| Rahim Taleqani et al. | Public opinion on dockless bike sharing: A machine learning approach | |
| US20120265819A1 (en) | Methods and apparatus for recognizing and acting upon user intentions expressed in on-line conversations and similar environments | |
| US9779169B2 (en) | System for ranking memes | |
| CN109983455A (en) | The diversified media research result on online social networks | |
| Yadav et al. | Sensitizing Netizen’s behavior through influencer intervention enabled by crowdsourcing–a case of reddit | |
| CN106062730A (en) | Systems and methods for actively composing content for use in continuous social communication | |
| KR20150075101A (en) | Sponsored stories in notifications | |
| Yen et al. | Personal knowledge base construction from text-based lifelogs | |
| US20190362025A1 (en) | Personalized query formulation for improving searches | |
| US20220215431A1 (en) | Social network optimization | |
| EP3905177A1 (en) | Recommending that an entity in an online system create content describing an item associated with a topic having at least a threshold value of a performance metric and to add a tag describing the item to the content | |
| CN112182414A (en) | The article recommends methods, devices and electronic equipment | |
| Chauhan et al. | Multidimensional sentiment analysis on twitter with semiotics | |
| US11711581B2 (en) | Multimodal sequential recommendation with window co-attention | |
| CA3190303A1 (en) | System and method for addressing disinformation | |
| US20250104310A1 (en) | Generating creative content customized as integration data for insertion into compatible distributed data sources at various networked computing devices | |
| EP3306555A1 (en) | Diversifying media search results on online social networks | |
| Leung et al. | Feature analysis of fake news: improving fake news detection in social media | |
| CN113656584B (en) | User classification method, device, electronic equipment and storage medium | |
| CN117541350A (en) | Product pushing method, device, computer equipment and storage medium | |
| US20220398381A1 (en) | Digital content vernacular analysis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| EEER | Examination request |
Effective date: 20230127 |
|
| EEER | Examination request |
Effective date: 20230127 |
|
| EEER | Examination request |
Effective date: 20230127 |
|
| EEER | Examination request |
Effective date: 20230127 |
|
| EEER | Examination request |
Effective date: 20230127 |
|
| EEER | Examination request |
Effective date: 20230127 |