WO2024168324A1 - Système de gestion et de révision de documents, mis en œuvre par ordinateur et fonctionnant sur la base d'annotations - Google Patents
Système de gestion et de révision de documents, mis en œuvre par ordinateur et fonctionnant sur la base d'annotations Download PDFInfo
- Publication number
- WO2024168324A1 WO2024168324A1 PCT/US2024/015291 US2024015291W WO2024168324A1 WO 2024168324 A1 WO2024168324 A1 WO 2024168324A1 US 2024015291 W US2024015291 W US 2024015291W WO 2024168324 A1 WO2024168324 A1 WO 2024168324A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- region
- user interface
- graphical user
- annotation
- implemented
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/169—Annotation, e.g. comment data or footnotes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Definitions
- This disclosure relates generally to the field of annotating or extracting data content from a document or set of documents, and organizing the set of documents into a system, particularly when the extracted or annotated data needs to be stored, retrieved, or searched according to a pre-defined structure.
- the invention is particularly useful for cases in which the data to be extracted/annotated (a) needs to be stored according to a structured template,
- This disclosure also relates generally to the field of generating a written review report document about a specific source document or set of documents, whereby the report contains content generated by one or multiple users.
- Document means any structured or unstructured data set, across a variety of data formats.
- “Element of a document” or “document element” means any structural, textual, metadata, reference-based, graphical, media, or other type of component that is part of a document.
- Document content or “content” or just “data” is a document element that refers to a text, graphical, or media component present in a document. It can refer to specific entities such as variables, effects, scientific claims, elements of a legal case, traditional entities (people, places, things), etc. Note that content may take many forms, such as freeform text, but the content could also be fields in a dataset, entities in a database, regions of an image, timeframes in a video or audio recording, entities in a video or signals in an audio clip, etc., which are all examples of content that can be annotated.
- ‘Document context” or “context” or “contextual element” is a document element that refers to the structural, metadata, or reference -based elements of a document, whereby those elements may present in a document or outside of a document.
- Structural element or “structure element” is a document context that refers to the layout or other physical characteristics of the document content.
- the layout may be explicitly or implicitly defined.
- section headings provide explicit indication that a structural element (in this case, a section of the document) has begun, and a subsequent heading indicates that the previous section structure element is complete.
- Other example of explicit structure elements are tables of information that have a well-defined tabular structure along with a table name and description.
- Implicit structures are chunks of content that are distinguished visually in some way from surrounding content but with no explicit indication of intent. Some examples include: footnotes; text placed in margins, headers, or footers of a page in the document; insets of text blocks on a page distinguished with margins, or different fonts or colors from the primary text content.
- Document structure or “content structure” or “structure” refers to the interpretation of the layout of the structural elements of a document. This can include section hierarchies, page layout (e.g. margins, columns), locations of structure elements (e.g. sections, data tables, figures, footnotes, insets), etc.
- Reference linkage or “reference link” or “reference element” is a document context that provides linking information. There are two ends to a reference link: the referrer element, and the referee element. The referrer element is the thing doing the referring, while the referee element is the thing being referred to. At least one end of the reference link has to be associated with something in or about the document.
- Document metadata or data about the document, is a document context that includes information about a document such as when and where the document was published, its authors, the document’s size, provenance information, the domain it refers to, etc.
- Content metadata which is data about the content, is a document context that includes information such as text length, font and style information, location in the document and on the page, etc.
- ‘Meta-content,” which is content about content, is a document context that provides summarization, description, interpretation, opinion, review, expansion or other narrative information about one or more content elements in the document.
- “Region of a graphical user interface” means a visually delineated region of a graphical user interface, such as a pane or window.
- region and “pane” are being used as a synonym for “region of a graphical user interface,” to facilitate readability.
- Data field is a type of document element that can be used to store or display data. Examples in a graphical user interface include text fields, selectors, or cells in a table.
- ‘Data value” or “annotation value” is a type of document element that is data. It is what is generally thought of as the content of a data field.
- Content markup or “content mark-up” refers to the visual enhancement of an element of the content of a document, whereby that element is made to appear more visible or prominent, for example through the addition of a highlight, a border, a different text or background color, etc.
- Report review document refers to a document containing organized data about a source document or set of documents, whereby this information is written for a specific audience, typically with the goal of being shared with that audience.
- “Location marker” is a textual or graphical element that points to a specific location in a document, such as a document page, document section, a document paragraph, etc., If the location marker is text -based, it can represent the name of the document location it refers to either in full (e.g., “Section 2-1,” “page 1”) or in an abbreviated format (e.g., “p.1”).
- Data unit is a section of the content of a document.
- a data unit can correspond to a section such an individual research study or a research dataset associated with that research project; in a legal document, a data unit can correspond to a section such a description of a legal case.
- ‘Data annotation” refers to identifying a relevant content element within a document or dataset and recording an annotation pertaining to that content element.
- An annotation can take one of many forms including: providing a comment, providing an explanation; classifying, codifying, tagging, or augmenting content; capturing metadata, managing state-related content or other meta-content, that is content about content.
- annotations for capture and presentation include: content metadata, content markup, content extraction, content notation, and content review.
- Other types of annotations such as state management, tagging, coding, and augmentation also apply in analogous ways to the invention, though not exhaustively described.
- Data extraction or “content extraction” is a type of data annotation that identifies within a document at least one content element (e.g., a text fragment, table column, graphic, etc.) that is deemed a good fit for at least one data field and records that content element as a data value for that data field.
- the process of data extraction includes explicitly adding previously identified context to the document content. This tasks typically, although not always, involves systematically going through an extraction form or template and identifying, for each data extraction field, the relevant entities in the document text that fit the criterion for that field. This task can, in principle, be fully automated, because it relies on a comparison between document text and a criterion.
- the data fields for data extraction may be part of a field assembly that includes contextual information about the field and field’s content.
- the field assembly may represent the concept of Title of the document as the context, and the content would then be the actual title (e.g., “The Sun Also Rises”), thereby adding the context Title to the content “The Sun Also Rises.”
- the annotation concepts of data coding and tagging are closely related to data extraction.
- ‘Data notation” or “content notation” is a type of annotation that provides new context (e.g. associating an opinion, references, additional facts, or explanation) to document content.
- This task typically, although not always, involves going through all or part of the document content and identifying relevant document content elements that can be used for evaluating the document along one or more criteria. This task may be difficult to fully automate because it relies on the expertise of a human annotator, often a subject matter expert (SME) of the content in the document(s).
- SME subject matter expert
- Data augmentation is a type of annotation that explicitly alters in some way the content referred to in annotations such as extractions or notations, for example through typographical corrections or insertion or deletion of data from the referred content.
- State management data is a metadata-type annotation that identifies the current or historical states of referenced content or another annotation.
- a state is computer system dependent; some (by no means comprehensive) examples of annotation states pertaining to the data include: “needs review”, “needs verification”, “validated”, “reviewed”, and “incorrect data”
- Data Tagging or “Tagging” is a type of annotation closely related to data extraction in that previously defined context is associated with a document content element.
- tags could be created ahead of time or at the same time as the association occurs.
- a tag may also be represented with an icon image, such as an emoticon or symbol, as well as with words or digits.
- Emoticon tagging is more often used with metadata tagging, especially state representation (e.g., a “check mark” image might be used to indicate a state of “reviewed”) but can be used anywhere tags are applicable.
- Tagging is an especially useful addition to extraction techniques to allow users to identify important contexts, variable classification, communicating information to other project team members (e.g., “high priority”, “fix this” or “please review”), or for providing consistency in domain-specific terminology by reducing typographical error.
- ‘Data Coding” is a type of data extraction and tagging that provides domain-specific context to data content.
- a dictionary of codes and their meaning can be stored within a computer system and presented in a user interface (such as with a searchable selector of code tags) and used in the same manner as the data extraction examples, or the dictionary of codes can be kept by the users outside of the computer system and coding is performed by creating tags as needed when the user tags (codes) the relevant content.
- An example from the medical field is the code “N02BE01” representing the drug Paracetamol (Acetaminophen) as defined in the Anatomical Therapeutic Chemical (ATC) Classification System, a coding dictionary developed by the World Health Organization (WHO) for the classification of drugs and other medical products.
- WHO World Health Organization
- a user may highlight content containing or referencing “Paracetamol” or “Acetaminophen” and tag it with “N02BE01”.
- Data coding is particularly valuable for domains that require high consistency and accuracy in their annotation processes, such as for health care processes and in machine learning. Coding dictionaries can be even more narrowly domain -specific, such as a set of codes designed for a particular project.
- “Document Navigation Region” or “Region 100” (Fig. 1) refers to the representation of the available documents in the corpus (e.g., assigned to a project). This region is interconnected with the other regions identified. It can provide an interactive representation of document- related information such metadata, statistics, state of the document in a corpus of documents and, most relevantly, a concise representation of the select content or annotations for each document.
- Document Content Region or “Region 200” refers to the representation of the content of a source document being annotated (e.g., extraction, notation). This region is interconnected with the other regions identified. It provides a method for a user to select (identify) content data within a document to annotate as well as view previously identified content within the context of all the document’ s content.
- Document Annotation Region or “Region 300” refers to a representation of the annotated data. This region is interconnected with the other regions identified. It is designed for displaying structured information, such as one or more forms, tables, or graphs, and may provide a method for a user to interact with and edit extracted and annotated data.
- Document Notation Region 400 refers to a visual representation of a specific case of a Document Annotation Region, shown in a region of a graphical user interface, such as a window or pane, distinct from other regions. This region is often structured as a series of notation/comment boxes, and typically has the appearance of a form with entries. The region is interconnected with the other regions described in this disclosure. It provides a method for a user to interact with and comment or describe content or other annotated data. This region highlights a specific case of Document Annotation Region, one that is focused more on data notation rather than other annotations such as extraction, coding or tagging.
- Document Review Report Region 500 refers to the visual representation of a specific case of a Document Annotation Region, primarily comprising of compiled data from one or more other Annotation Regions such as a Document Notation Region 400 into a single, continuous document that has the appearance of a review report document (e.g., appears as text, markup, HTML, PDF, etc ). It is shown in a region of a graphical user interface, such as a window or pane, distinct from other regions. It provides a method for a user to review, interact with, and edit the annotations, to ultimately produce a human reader -friendly review report that can be shared with the intended audience.
- This invention pertains to two different, but conceptually related tasks: (1) extracting relevant content from a set of one or more documents, largely based on a set of pre-defined data extraction criteria or fields, and (2) generating a narrative report that provides a review about the content of a set of one or more documents.
- the commonality between these tasks is that both of them are easier to perform if they comprise a process of annotating content from the document set, the annotation being in the service of either extraction or review.
- Step 3 of the data extraction process can be conducted faster and with higher accuracy if an explicit linkage is created between the identified document content element and its corresponding data extraction field, which can more quickly help identify extractions that are missing, incorrect, ambiguous, etc.
- the typical process of generating a narrative review report document that references content from more other documents entails the following steps: (1) identifying, inside a source document (or set of documents), at least one content element that one wishes to make a notation — such as a comment — about, (2) creating at least one notation, in at least one notation category, for each identified content element in the source document/set of documents; such notations can be created in multiple locations, for example: (a) inside the source document, typically to the side of the document text, next to the location of the relevant content (an example of this are comments that users can generate in Word or Google Docs), or (b) inside a separate document, which serves as the review report document, (3) compiling the notations into a review report (i.e., if the notations are generated inside the source document, then this step involves copying, across the source documents, all the relevant notations, and compiling this information into a review report document; it may also involve organizing this compiled information, and, if
- the present disclosure makes it faster and more accurate to perform one or more of the steps described above, either for extracting data from a set of documents, or for generating a written review report document that references content from more other documents. It does so largely by establishing relationships (i.e., ‘linkages’) between the different elements generated during data annotation task, in combination with automating certain steps (for report review generation), as described below:
- the invention makes the content classification step faster by utilizing two side-by-side regions of a graphical user interface — a Document Content Region 200 and a Document Annotation Region 300 — that are interlinked, such that an operation performed in one region triggers a subsequent, classification-supporting operation in the other region.
- a particular visual indicator for example, a specific color
- the invention makes accessing and reviewing the annotated data faster, which can be done at two levels: (1) Within a document, the invention utilizes mirror linkages between the Document Content Region 200 and Document Annotation Region 300, whereby (a) the same element in the Document Annotation Region 300 can be linked to several markups in the Document Content Region 200, and a location indicator gets created in the Document Annotation Region 300 for each markup, and (b) performing an action on a content markup or annotation data in a region (for example, clicking on it or deleting it) results in the same/equivalent action being applied to the corresponding element in the other region; (2) Across documents, the invention utilizes a system of three interlinked regions of a graphical user interface (Document Content Region 200, Document Annotation Region 300, and Document Navigation Region 100) and allows for a full information exchange loop to be completed among them, using hyperlinked data elements.
- a graphical user interface Document Content Region 200, Document Annotation Region 300, and Document Navigation Region 100
- Issues with the typical written review report document generation process include the following: if the annotations are created inside the source document, typically to the right of the document text (which makes it easy to write notations while reading or inspecting the document), then, after all the notations have been created, this method places on the user the onus to copy each (relevant) comment, compile them into a separate document, and add location references when relevant (such a method is inefficient and can lead to errors when referencing the locations in the source document); if the notations are created inside a separate document, which serves as the review report document, the user must work with two different documents side-by-side (i.e., the source document and the review report document), and if the two documents are accessed via different software (e.g., PDF vs. Word), switching back-and- forth between the two documents can be tedious for the user and can lead to errors when referencing the locations in the source document.
- the annotations are created inside the source document, typically to the right of the document text (which makes it easy to write notations while reading
- the invention provides the following benefits the make the written review report document generation process more efficient: (for a review report document creator): simplifies the process of converting one’ s notations pertaining to a source document (or set of documents) into a compiled review report document, with relevant location references, and ensures accurate location referencing; (for the review report document audience): simplifies the process of checking the content of the review report document against the relevant locations in the source document.
- Figure 1 is an illustration of the Interlinked Graphical User Interface 901, in which Region 100 and Region 200 are Connected via Reference Linkage 812;
- Figure 2 is an alternative illustration of the Interlinked Graphical User Interface 901, in which Region 100 and Region 200 are Connected via Reference Linkage 814;
- Figure 3 is an alternative illustration of the Interlinked Graphical User Interface 901, in which Region 300 Contains More than One Annotation Value
- Figure 4 is an illustration of Region 200 and Region 300 of Interlinked Graphical User Interface 901 and Their Corresponding Linkage When Region 200 Contains More Than One Data Unit;
- Figure 5 is an illustration of Interlinked Graphical User Interface 902, Showing Region 200 and Region 400 and their Linkage 840;
- Figure 6 is an illustration of Interlinked Graphical User Interface 902, Showing Region 200 and Region 400 and their Linkage 842;
- Figure 7 is an illustration of Interlinked Graphical User Interface 902, Showing Region 200 and Region 500 and their Linkage 850;
- Figure 8 is an illustration of Interlinked Graphical User Interface 902, Showing Region 200 and Region 500 and their Linkage 852;
- a computer-implemented, annotation-based document management system 10 the system comprising three interlinked regions of a graphical user interface 901, can be constructed from the following components: a first region of the graphical user interface 10 (Document Navigation Region 100, also referred to as Region 100) that contains an identifier 110 for a first document 6110; a second region of the graphical user interface 901 (Document Content Region 200), also referred to as Region 200) that displays some or all of the content 250 of the first document 6110, wherein a first content element 262 of the content 250 of the first document 6110 is associated with a first content mark-up 222; a third region of the graphical user interface 901 (Document Annotation Region 300, also referred to as Region 300) that contains a first field 362 that can be populated with a first annotation value 6110-322 (322) corresponding to the first content mark-up 222 displayed in Region 200 of the graphical user interface 901; a first reference linkage 812
- Region 100 is shown above Region 200. In an alternative embodiment of this invention, Region 100 and Region 200 are shown in a different arrangement, for example side-by-side. In a preferred embodiment of this invention, Region 100 is shown above Region 300. In an alternative embodiment of this invention, Region 100 and Region 300 are shown in a different arrangement, for example side-by-side.
- Region 200 is shown side-by-side with Region 300, either to the right or to the left of Region 300.
- users choose which region arrangement they prefer, whereby the assumption is that right- handed users prefer Region 300 to be shown on the right and left-handed users prefer Region 200 to be shown on the left.
- Region 200 is shown above or below Region 300, particularly (but not exclusively) for cases where there is limited horizontal space (e.g., on a mobile device in portrait orientation)
- Region 100 represents the identifier 110 for the document 6110 in different ways, for example as a document name, a document number, an excerpt from the document content 250, etc.
- the identifier 110 for a first document 6110, an identifier for a second document 6112, and an identifier for a third document 6114 are listed in a structured format, for example as part of a tabular format that contains columns, rows, or both., or as part of some other type of structured format.
- the content element 262 is a fragment of the content 250 of document 6110, and includes a text fragment (such as a word, phrase, sentence, paragraph, numerical value), a graphic representation, a media element, or some other type of document content.
- content mark-up 222 is generated either manually, by a user who interacts with the content 250 of document 6110, or through other means, for example by configuring the computer-implemented, annotation-based document management system 10 to automatically identify and mark up certain elements within the content 250 of document 6110, based on a first rule or criterion.
- a first rule or criterion can specify, for example, what element of the content 250 the system should look for, how to identify any relevant elements, which portion of the relevant elements to mark up, etc.
- Region 200 displays the content of more than one document, for example by simultaneously showing the content of the first document 6110 and the second document 6112 side-by-side, or by allowing users to navigate from the content of the first document 6110 to the content of the second document 6112 via a first navigation aid such a hyperlinked lists, a navigation tab, etc., wherein the first navigation aid is located either inside or outside of Region 200.
- a first navigation aid such as hyperlinked lists, a navigation tab, etc.
- the first field 362 is of multiple types, for example: a text entry field (or ‘text field’), a multiple-choice field, a dropdown field, a slider, combinations thereof, etc.
- the first field 322 is used for populating, entering, editing, or displaying the first annotation value 6110-322.
- the first annotation value 6110-322 is of multiple annotation types, for example a manual or automatic extraction, a notation such as a commentary, a data augmentation, etc.
- the annotation value 6110-322 is populated into the first field 362 either manually, by a user, or through other means, for example by configuring the computer- implemented, annotation-based document management system 10 to automatically populate a value in first field 362 based on a second rule or criterion related to the content 250 of document 6110 or to the context of document 6110.
- the first rule or criterion overlaps at least partially with the second rule or criterion, as they both refer to the same element of the document 6110.
- the computer-implemented, annotation-based document management system 10 is configured to automatically generate both the content mark-up 222 and its associated annotation value 6110-322, then the computer-implemented, annotation-based document management system 10 is also configured to automatically generate the reference linkage 822 that connects the content mark-up 222 to annotation value 6110-322.
- Region 100 also comprises a first representation for the first annotation value 6110-322 (representation 122), wherein representation 122 shows either the actual annotation value 6110-322 or an indicator for that value, such as an abbreviation, symbol, icon, etc..
- the field 360 in Region 300 shows a dedicated field name (which is useful for data extraction tasks, in which the data extraction typically conforms to a pre-specified template with pre-specified field names, and the system user need to know exactly which field to enter an annotation value into).
- the field 320 in Region 300 does not have a field name (which is useful for notation tasks in which the system user typically creates the first round of the notations without assigning the notations to pre-specified field names).
- the field 360 and its corresponding annotation value 6110-320 are combined and stored as a first annotation field assembly 370.
- Regions 100, 200, and 300 are interlinked, such that a user (or a combination of user and computer system) performing an action in one region affects what content or data is displayed or populated in at least one other region.
- the first reference linkage 812 connects the identifier 110 for the first document 6110 in the Region 100 with the content 250 of the first document 6100 in Region 200.
- the reference linkage 812 is interactive and is configured in such a way that interacting with the identifier 110 for the first document 6110 in the Region 100 causes some or all of the content 250 of the first document 6100 to be displayed in Region 200.
- the interaction with the identifier 110 is initiated in multiple ways, such as manually, by a user of the computer-implemented, annotation-based document management system 10, or automatically, by the computer- implemented, annotation-based document management system 10.
- a user interacts with the document identifier 110 in Document Navigation Region 100 in multiple ways, for example by clicking on it hovering over it, which causes the content for Document 6110 to load or become visible in Region 200, displayed as a PDF file, text, file, HTML file, or some other type of file that shows the document content.
- the document identifier 110 is accompanied by a corresponding first textual or graphical referrer element 160 that indicates the presence of a reference linkage, for example via a text-based call to action such as “click here,” “open,” “view” or an icon such as a link or document icon.
- the computer-implemented, annotation-based document management system 10 comprises a reference linkage 814 that connects the textual or graphical referrer element 160 in Region 100 to Region 200.
- the reference linkage 814 which represents an alternative embodiment of the first linkage between Region 100 and Region 200 (and hence is an alternative to the reference linkage 812 in Figure 1) is interactive and is configured in such a way that interacting with the textual or graphical referrer element 160 corresponding to the first document 6110 in the Document Navigation Region 100 causes some or all of the content of the first document 6110 to be displayed in the Document Content Region 200.
- the interaction with the textual or graphical referrer element 160 is initiated in multiple ways, such as manually, by a user of the computer-implemented, annotation-based document management system 10, or automatically, by the computer- implemented, annotation-based document management system 10.
- a user interacts with the document textual or graphical referrer element 160 in Document Navigation Region 100 in multiple ways, for example by clicking on it hovering over it, which causes the content for Document 6110 (250) to load or become visible in Region 200.
- the computer -implemented, annotationbased document management system 10 also comprises a linkage from Region 200 to Region 100, wherein loading the content 250 for document 6110 in Region 200 makes visually salient the identifier 110 for that document in Region 100, for example, by changing the font for the document identifier 100 to bold, by changing the background color for the identifier, etc.
- the second reference linkage 822 connects the content mark-up 222 in Region 200 with the first annotation value 6110-322 in Region 300.
- the reference linkage 822 is interactive and is configured in such a way that interacting with the first content mark-up 222 in Region 200 causes the first annotation value 6110-322 to be displayed or to become more visually salient in Region 300.
- the interaction with the first content mark-up 222 is initiated in multiple ways, such as manually, by a user of the computer-implemented, annotation-based document management system 10, or automatically, by the computer-implemented, annotation -based document management system 10.
- annotation value 6110-322 is increased via system responses such: causing Region 300 to scroll to the location where the annotation value 6110-322 is located; causing the cursor to become positioned at or next to the location where the annotation value 6110-322 is located; causing the visual appearance of the annotation value 6110-322 to change, for example through a change in background color or font styling; if 6110-322 is located inside an accordion or tab, causing the accordion or tab to open, or other system responses.
- the annotation value 6110-322 can become more visually salient in multiple ways, comprising the following: if the data category associated in the first field 322 is located inside an accordion that may be either open or closed, the system opens the accordion; if the data category associated in the first field 322 is located inside a tab that may be either selected or unselected, the system selects that tab.
- Region 200 and Region 300 are further configured to track with each when scrolling, so that the portion of the content 250 shown in Region 200 determines what fields or annotation values are shown in Region 200, and the fields or annotation values are shown in Region 200 determine what portion of the content 250 is shown in Region 200.
- the first content mark-up 222 and the first annotation value 6110-322 are further associated in such a way that making changes to the content mark-up 222 automatically triggers a corresponding change in the first annotation value 6110-322.
- Changes to the content mark-up 222 include changing which element of the content 250 of the document 6110 the content mark-up 222 refers to, deleting the content mark-up 222, and other types of changes.
- the computer-implemented, annotation-based document management system 10 is further configured in such a way as to make the generation of the first content mark-up 222, of the first annotation value 6110-322, and of reference linkage 822 require the minimum amount of effort on the part of the system user.
- field 362 pertains to a data extraction task (manual or automated classifying, coding or otherwise providing a specific context to document content) and is a text entry field
- the system assumes that the user wants to record a first relevant element 262 of the content of document 6110 (typically a text fragment from the document) in its exact or closely matching form.
- the content mark-up 222 gets automatically generated and positioned on top of, underneath, to the side of, or in some other location in the vicinity of the content element 262,
- the document content element 262 gets automatically populated into the field 362, thereby generating annotation value 6110-322.
- the reference linkage 822 gets automatically generated, which connects the content mark-up 222 and the annotation value 6110-322.
- annotation value 6110-322 gets automatically generated when the user interacts with an element associated with field 362, such as the name of field 362 or an icon or symbol placed in the vicinity of field 362.
- field 362 is a type of field that contains pre-specified options, such as a multiple-choice or a drop-down field
- the user indicating which field option the document content element 262 should be associated with automatically creates a linkage between the document content element 262 and the field option indicated by the user
- the system is further configured in such a way that, if a user first enters a text into field 362 or indicates an option for field 362 and then proceeds to select the content element 262 in Region 200, a content mark-up 222 gets automatically generated for the content element 262, and a linkage gets automatically generated between the content markup 222 and the field option indicated by the user.
- field 362 pertains to a data notation task (where users want to generate a first comment or some other notation related to a content element)
- the system is configured in a similar manner as described above in this paragraph, except that it does not assume that the user wants to record the content element 262 of document 6110 in its exact or closely matching form, unless the user specifically opts to do so.
- the user clicks or positions the cursor inside the field 362, or otherwise interacts with an element associated with field 362 the document content element 262 does not get automatically populated into the field 362, and the annotation value 6110-322 gets generated or displayed only when the user enters something in the field 362.
- the second linkage 822 is interactive and is further configured in such a way that interacting with the first annotation value 6110-322 in Region 300 causes the content mark-up 222 to be displayed or to become more visually salient in Region 200.
- the visual salience of the content mark-up 222 in Region 200 is increased in multiple ways, for example by changing the visual appearance of the content markup 222 or its associated content element 362, by causing the document content 250 to scroll in Region 100 to or close to the location where the content mark-up 222 is located with the document content 250, by positioning the cursor in Region 200 at or in the vicinity of the location where the content markup 222 is located inside the document content 250.
- the computer-implemented, annotation -based document management system 10 also comprises a third linkage 832 that connects Region 200 and Region 300.
- the third linkage 832 is interactive and is configured in such a way that a user or the system generating, changing or deleting annotation value 6110-322 in Region 300 leads to a corresponding generation, change, or deletion of the representation 122 for annotation value 6110-322 in Region 100.
- Region 300 also comprises a second field 360, which is used for entering, editing, or displaying a second annotation value 6110- 320 (320),
- the second annotation value 6110-320 is associated with document 6100, but does not always have a corresponding content markup in Region 200 or a corresponding reference linkage.
- annotation value 6110-320 and Region 200 consists in annotation value 6110-320 referring to at least one element 262 of the content 250 of the document 6110 or one first context element of the document 6110-320, wherein the first context element is manually or automatically generated in the process of the user interacting with the content element 262 of the content 250 of the document 6110.
- Region 100 also comprises a first representation 120 for the annotation value 6110-320.
- the second annotation value 6110-320 (320) in Region 300 and its corresponding representation 120 in Region 100 are connected via a linkage 830.
- the linkage 830 is an alternative embodiment of the third linkage that connects Region 100 and Region 300 (thereby representing an alternative to the linkage 832).
- the computer -implemented, annotation-based document management system 10 further comprises a fourth reference linkage 810, which connects Region 100 to Region 200.
- the fourth reference linkage 810 is interactive and is configured in such a way that interacting with the representation 122 for first annotation value 6110-322 in Region 100 causes the content mark-up 222 to be displayed or to become more visually salient in Region 200.
- the fourth reference linkage 810 is further configured in such a way that interacting with the content mark-up 222 in Region 200 causes the representation 122 for first annotation value 6110-322 to be displayed or to become more visually salient in Region 100.
- Regions 100, 200, and 300 are interlinked via the reference linkages 812 (or, alternatively 814 or 810), 822, and 832, such that a user action performed in a region causes a system response (or calls for a corresponding user action) in another region, enabling a chain of actions-responses, actions-actions, or combinations thereof that enables a full action-response loop.
- the full action-response loop is established from Region 100 back to itself, via Regions 200 and Region 300, wherein a system user interacting with an element of Region 100 causes a response in Region 200, the system user interacting with an element of Region 200 causes a response in Region 300, and the system user interacting with an element of Region 300 causes a response in Region 100.
- a computer-implemented, notation-based document review system 20 for a set of one or more documents, comprising three interlinked regions of a graphical user interface 902 can be constructed from the following elements: a first region of the graphical user interface (Region 200) that displays some or all of the content of a first document 6110, wherein a first element of the content 262 of the first document 6110 is associated with a first content mark-up 222; a second region of a graphical user interface (Region 400) that contains a first annotation value 6110-420 (420) corresponding to the first content mark-up 222 and a second annotation value 6110-422 (422) that can, but does not need to have, a corresponding content mark-up; a third region of a graphical user interface (Region 400), which combines the first annotation value 6110-420 (420) and the second annotation value 6110-420 (422) into a first, editable section 520 of
- annotation value 6110-420 (420) is a notation, such as a comment.
- annotation value 6110-420 (420) can also be a different type of annotation, such as a data extraction, data augmentation, data coding, etc.
- the first linkage 840 connects the first content mark-up 222 in Region 200 and the first data annotation value 6110-420 (420) in Region 400.
- the computer-implemented, notation-based document review system 20 is further configured in such a way as to automatically display a first location marker 464 in the vicinity of the first annotation value 6110-420 (420), the first location marker 464 indicating where the first content mark-up 222 is located within the content of the first document 6110.
- the computer-implemented, notation-based document review system 20 further contains a linkage 842 that connects the first location marker 484 in Region 400 and the first content mark-up 222 in Region 200.
- the linkage 842 is an alternative to linkage 840, such that, in an embodiment, Region 200 and Region 400 are connected via linkage 840 and in an alternative embodiment, Region 200 and Region 400 are connected via linkage 842.
- the second linkage 850 connects data annotation value 6110-420 (420) from the editable section 520 of the review report to the first content mark-up 222 in Region 200.
- the computer-implemented, notation-based document review system 20 further contains a linkage 852 that connects the first location marker 484 in Region 500 and the first content mark-up 222 in Region 200.
- the linkage 852 is an alternative to linkage 850, such that, in an embodiment, Region 200 and Region 500 are connected via linkage 850 and in an alternative embodiment, Region 200 and Region 500 are connected via linkage 852.
- the computer -implemented, notation-based document review system 20 displays Region 200 and Region 400 side-by-side.
- the computer -implemented, notation-based document review system 20 displays Region 200 and Region 500 side-by-side.
- a system user can only see two regions side-by-side at any given time, such that the computer-implemented, annotation-based document management system 20 shows either Region 200 and Region 400 side-by-side, or Region 200 and Region 500 side-by-side.
- a system user can switch between a view that shows Region 200 and Region 400 side-by-side and a view that shows Region 200 and Region 500 side-by- side.
- the horizontal position of Region 200 relative to Region 400 is the mirror opposite of the horizontal position of Region 200 and Region 500.
- Region 200 is designated as the main region (based on the assumption the user is primarily interested in seeing the content of document 6110 in Region 200), and is displayed in a more salient way than Region 400, for example by having a larger size than Region 400, being more centrally located than Region 400, etc.
- Region 500 is designated as the main region (based on the assumption the user is primarily interested in seeing and editing the content of the review report in Region 500), and is displayed in a more salient way than Region 200, for example by having a larger size than Region 200, being more centrally located than Region 200, etc.
- Region 400 when Region 200 and Region 400 are shown side-by-side, Region 400 is shown to the right of Region 200, to make it easy for system users to use their right hand to create annotations.
- users can choose the horizontal position of Region 400 relative to Region 200, to accommodate the needs of left-hand users.
- Region 200, Region 400, and Region 500 are shown in an alternative configuration, such as all three regions shown side-by-side, one region shown on top or below the other two regions, etc.
- the linkage 840 is interactive and configured in such a way that interacting with content mark-up 222 in Region 200 causes data annotation 6110-420 (420) to be displayed or become more visually salient in Region 400.
- the linkage 840 is interactive and configured in such a way that interacting with data annotation value 6110-420 (420) in Region 400 causes content mark-up 222 to be displayed or become more visually salient in Region 200.
- the linkage 842 is interactive and configured in such a way that interacting with content mark-up 222 in Region 200 causes the location marker 464 to be displayed or become more visually salient in Region 400.
- the linkage 842 is interactive and configured in such a way that interacting with the location marker 464 in Region 400 causes content mark-up 222 to be displayed or become more visually salient in Region 200.
- the linkage 850 is interactive and configured in such a way that interacting with content mark-up 222 in Region 200 causes data annotation 6110-420 (420) to be displayed or become more visually salient in Region 500.
- the linkage 850 is interactive and configured in such a way that interacting with data annotation value 6110-420 (420) in Region 500 causes content mark-up 222 to be displayed or become more visually salient in Region 200.
- the linkage 852 is interactive and configured in such a way that interacting with content mark-up 222 in Region 200 causes the location marker 464 to be displayed or become more visually salient in Region 500.
- the linkage 852 is interactive and configured in such a way that interacting with the location marker 464 in Region 500 causes content mark-up 222 to be displayed or become more visually salient in Region 200.
- Content mark-up 222 in Region 200, annotation value 6110-420 (420) in Region 400, and annotation 6110-420 (420) in Region 500 are further interlinked such that, making a change or deletion to Content mark-up 222 results in a corresponding change or deletion to annotation value 6110-420 (420) in Region 400 and annotation 6110-420 (420) in Region 500.
- Region 500 shows a textual or graphical representation for annotation value 6110-420 (420), such as a fragment, an abbreviation, a symbol, an icon, or some other representation.
- Region 400 and Region 500 are editable.
- Section 520 of the review report in Region 500 is editable, such that a user can edit or delete the annotation values 6110-420 (420) and 6110-422 (422) and add additional text element or other content elements in Region 500.
- a text element added by the user to Region 500 can be linked to Region 200 via a content mark-up.
- the system 20 is further configured automatically generates hyperlinked location references in Region 500, based on the text entered by the user in Region 500. For example, if the text entered by the user references a specific section in the document 6001 (e.g., Abstract, Study 1, General Discussion), the system 20 automatically create a link between the text mentioning that section and the heading or the beginning of that section in the document 6001 in Region 200.
- a specific section in the document 6001 e.g., Abstract, Study 1, General Discussion
- location marker 464 is contrasted from annotation value 6110-420 (420), for example, by being shown in a visual area that is separate from annotation value 6110-420 (420) or has a visual styling that contrasts with annotation value 6110-420 (420).
- location marker 464 is shown visually integrated into the text of annotation value 6110- 420 (420), for example by being appended next to annotation value 6110-420 (420) and being shown in brackets, for example in the format (p. 23) or (Section 2.3).
- annotations from each user can be shown in one view. They can be shown as one combined document (e.g., with the annotations displayed side-by-side or one after the other), or can be loaded individually, when one selects a specific reviewer.
- the report reviews generated by the reviewers can be shown in one view. They can be shown as one combined document (e.g., with the annotations displayed side- by-side or one after the other), or can be loaded individually, when one selects a specific report review,
- the system 20 is further configured to allow for the role of a super -user, who can access multiple review reports, and edit them into a composite report (an example of such a super-user being the Editor or Associate Editor of a peer-reviewed journal, who would have access to the review reports created by individual reviewers for a submitted manuscript, and would want to compile them into a report to be shared with the manuscript author(s)).
- a super -user who can access multiple review reports, and edit them into a composite report (an example of such a super- user being the Editor or Associate Editor of a peer-reviewed journal, who would have access to the review reports created by individual reviewers for a submitted manuscript, and would want to compile them into a report to be shared with the manuscript author(s)).
- the system 20 contains several features that make it easy for the super-user to create a composite report, including the following: the individual annotations can be draggable, from the individual reports to the composite report; each annotations retains information about its provenance (i.e., which user/reviewer it was associated with), for example by being color- coded or associated with a text identifier that symbolizes its provenance, and the provenance information can be visually conveyed while or after an annotation is dragged from one report review to another; each annotation can also retain information about their initial order in the compiled review report, so that, if one or more annotations get dragged out of position in a compiled report review, the super-user can click on a button that restores (or at least displays) the original order of the annotations/notes within the compiled report review; annotations/notes that the super-user has already interacted with (for example, by dragging or editing them) can automatically be rendered in a visual style different from the remaining annotations/notes (for example, through
- the system 20 incorporates an intelligent component that automatically compares and pregroups annotations together. For example, if annotations from two or more users refer to the same (or similar) location(s) in a document, or are found, via an automated linguistic analysis, to reference the same or similar entities from the original document, the system can use that information to determine that those annotations should be grouped together into the same category. The system then establishes a correspondence between those annotations, wherein the correspondence is visually represented in multiple ways: for example, the annotations are all shown in the same stylistic manner (e.g., same font color), or the annotations can be hyperlinked, so that clicking on one annotation “activates” the remaining corresponding annotations, by marking up them or positioning the cursor over them.
- the correspondence is visually represented in multiple ways: for example, the annotations are all shown in the same stylistic manner (e.g., same font color), or the annotations can be hyperlinked, so that clicking on one annotation “activates” the remaining corresponding annotations, by marking up them or positioning the curs
- the pre-grouping together can be further accompanied by a feature that synthesizes consensus versus conflict.
- an automated linguistic analysis of the annotations grouped together determines the degree of overlap in sentiment (or some other evaluative assessment) between the corresponding annotations, and visually indicates whether the different annotations are in consensus or conflict with one another. This visual indication can be done, for example, by showing concurring annotations in the same color, and conflicting annotations/notes in different colors.
- the system 22 further comprises an export function, which takes the content of an individual or compiled review report and converts it a document format that can be shared with other users, such as text, Word, PDF, HTML, CSV, etc.
- Region 300 also comprises a first reference 364 to the first data unit 240 of the first document.
- the computer-implemented, annotation-based document management system 10 further comprises a fifth reference linkage 832 that connects Region 200 and Region 300.
- the fifth reference linkage 832 connects data unit 240 in Region 200 and the first reference 364 to the first data unit 240 in Region 300.
- Reference linkage 832 is configured to be interactive, such that interacting with all or part of data unit 240 in Region 200 causes the first reference 364 to the first data unit 240 to be displayed or become more visually salient in Region 300.
- the fifth reference linkage 832 further connects data unit 240 in Region 200 and the annotation value 6110-322 in Region 300, whereby annotation value 6110-322 has been categorized as being associated with data unit 240.
- the fifth reference linkage 832 is configured to be interactive, such that interacting with annotation value 6110-322 in Region 300 causes all or part of data unit 240 in Region 200 causes to become more visually salient.
- Reference linkage 832 is configured to be interactive, such that interacting with all or part of data unit 240 in Region 200 causes annotation value 6110-322 to be displayed or become more visually salient in Region 300.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Document Processing Apparatus (AREA)
Abstract
L'invention concerne la rationalisation de l'extraction de données à partir d'un document ou d'un ensemble de documents et la génération d'un rapport de révision écrit à propos d'un document source spécifique ou d'un ensemble de documents. Plus particulièrement, l'invention concerne un système de gestion de documents fontionnant sur la base d'annotations et mis en œuvre par ordinateur, comprenant trois régions interconnectées d'une interface graphique, permettant aux utilisateurs de naviguer facilement d'une région à l'autre et d'identifier, d'interagir, d'enregistrer ou de revérifier facilement des données dans différentes parties de l'interface graphique. L'invention concerne également un système de révision de documents mis en œuvre par ordinateur permettant de convertir automatiquement les annotations de l'utilisateur relatives à un document source en un rapport de révision compilé et modifiable, avec des références d'emplacement pertinentes liées au document source, permettant ainsi une création plus rapide et plus précise d'un document de révision final, et simplifiant le processus de vérification du contenu du rapport de révision aux emplacements pertinents dans le document source.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363444861P | 2023-02-10 | 2023-02-10 | |
| US63/444,861 | 2023-02-10 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024168324A1 true WO2024168324A1 (fr) | 2024-08-15 |
Family
ID=92263524
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/015291 Ceased WO2024168324A1 (fr) | 2023-02-10 | 2024-02-10 | Système de gestion et de révision de documents, mis en œuvre par ordinateur et fonctionnant sur la base d'annotations |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024168324A1 (fr) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120233558A1 (en) * | 2011-03-11 | 2012-09-13 | Microsoft Corporation | Graphical user interface that supports document annotation |
| US20150154165A1 (en) * | 2013-11-29 | 2015-06-04 | Kobo Incorporated | User interface for presenting an e-book along with public annotations |
| US9058588B2 (en) * | 2009-09-30 | 2015-06-16 | Palo Alto Research Center Incorporated | Computer-implemented system and method for managing a context-sensitive sidebar window |
| US20200019600A1 (en) * | 2010-08-04 | 2020-01-16 | Copia Interactive, LLC. | System for and method of annotation of digital content and for sharing of annotations of digital content |
-
2024
- 2024-02-10 WO PCT/US2024/015291 patent/WO2024168324A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9058588B2 (en) * | 2009-09-30 | 2015-06-16 | Palo Alto Research Center Incorporated | Computer-implemented system and method for managing a context-sensitive sidebar window |
| US20200019600A1 (en) * | 2010-08-04 | 2020-01-16 | Copia Interactive, LLC. | System for and method of annotation of digital content and for sharing of annotations of digital content |
| US20120233558A1 (en) * | 2011-03-11 | 2012-09-13 | Microsoft Corporation | Graphical user interface that supports document annotation |
| US20150154165A1 (en) * | 2013-11-29 | 2015-06-04 | Kobo Incorporated | User interface for presenting an e-book along with public annotations |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7664262B2 (ja) | クロスドキュメントインテリジェントオーサリングおよび処理アシスタント | |
| Constantin et al. | The document components ontology (DoCO) | |
| US10650086B1 (en) | Systems, methods, and framework for associating supporting data in word processing | |
| AU2015101061A4 (en) | Method and System for Processing or Notating a Document or Design | |
| US20130305149A1 (en) | Document reader and system for extraction of structural and semantic information from documents | |
| WO2021055102A1 (fr) | Assistant de création et de traitement intelligent de documents croisés | |
| WO2009007181A1 (fr) | Procédé, système et programme informatique permettant une annotation de texte intelligente | |
| US20240111944A1 (en) | System and Method for Annotation-Based Document Management | |
| US20070277095A1 (en) | Data Processing Device And Data Processing Method | |
| EP1818836A1 (fr) | Dispositif de traitement de données et méthode de traitement de données | |
| US20100217717A1 (en) | System and method for organizing and presenting evidence relevant to a set of statements | |
| EP1821176A1 (fr) | Dispositif de traitement de données et méthode de traitement de données | |
| US20100325528A1 (en) | Automated formatting based on a style guide | |
| Edhlund et al. | NVivo for Mac essentials | |
| WO2024168324A1 (fr) | Système de gestion et de révision de documents, mis en œuvre par ordinateur et fonctionnant sur la base d'annotations | |
| Fonseca et al. | Representation of structured data of the text genre as a technique for automatic text processing | |
| Edhlund et al. | NVivo 12 for Mac Essentials | |
| De Oliveira Santarosa Martins | Metadata Extraction and Digital News Preservation | |
| Lambert et al. | Microsoft Excel Step by Step (Office 2021 and Microsoft 365) | |
| Aparecida Fonseca et al. | Representation of structured data of the text genre as a technique for automatic text processing. | |
| Lepper et al. | Technical Topologies of Texts | |
| AU2015201669A1 (en) | Document Processing and Notating Method and System | |
| Aprilius et al. | Wiki CS annotation: Performing entity annotation within WordPress plugin | |
| Lapeyre | Why Create a Subset of a Public Tag Set | |
| Self | The DITA style guide: Best practices for authors |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24754181 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |