RU2824338C2

RU2824338C2 - Multistage training of machine learning models for ranking search results

Info

Publication number: RU2824338C2
Application number: RU2021135486A
Authority: RU
Inventors: Александр Алексеевич Боймель; Дарья Михайловна Соболева
Original assignee: Общество С Ограниченной Ответственностью "Яндекс"
Filing date: 2021-12-02
Publication date: 2024-08-07

Abstract

FIELD: physics.

SUBSTANCE: invention relates to a system and a method for training a machine learning model to rank digital objects of the use stage. Method includes obtaining by processor a first plurality of training digital objects, wherein each training digital object from the first plurality of training digital objects is associated with a parameter of past user actions indicating user actions of past users with said training digital object; training at the first stage of training based on the first set of training digital objects of the machine learning model to determine the parameter of predicted user actions for the digital object of the use stage, wherein the predicted user actions parameter indicates user actions of future users with the digital object of the use stage; obtaining by the processor a second set of training digital objects, wherein each training digital object from the second set of training digital objects is connected (a) with a training search query used to generate a training digital object from a second plurality of training digital objects, and (b) with a first label indicating the degree of relevance of the object from the second plurality of training digital objects to the training search query; training at the second training stage following the first training stage, based on the second plurality of training digital objects of the machine learning model, determining a synthesized label of the digital object of the use stage, indicating the degree of relevance of the digital object of the use stage to the search request of the use stage; application by a processor of a machine learning model with respect to a first plurality of training digital objects to augment an object from the first plurality of training digital objects with a synthesized label and thus forming a first augmented plurality of training digital objects; and training, based on the first augmented set of training digital objects of the machine learning model, to determine the relevance parameter of the digital object of the use stage, which indicates the degree of relevance of the digital object of the use stage to the search request of the use stage, wherein the training digital object from the first plurality of training digital objects contains an indication of the digital document associated with the document metadata, and based on the first plurality of training digital objects, training the machine learning model at the first training stage further includes: converting the document metadata into a text representation thereof containing tokens; preprocessing the text representation for masking several masked tokens therein; and training, based on the first plurality of training digital objects, of the machine learning model to determine a token from a plurality of masked tokens based on the context provided by neighboring tokens, wherein the relevance parameter of the digital object of the use stage further indicates a semantic relevance parameter indicating the degree of semantic relevance of the search query of the use stage to the content of the digital object of the use stage.

EFFECT: high relevance of search results generated by a search engine in response to a user request, due to accurate ranking of search results on a SERP page, performed by a machine learning model.

23 cl, 6 dwg

Description

Область техники, к которой относится изобретениеField of technology to which the invention relates

[01] Настоящая технология относится к способам машинного обучения, в частности, к способам и системам для обучения и применения моделей машинного обучения на основе трансформера для ранжирования результатов поиска.[01] The present technology relates to machine learning methods, in particular to methods and systems for training and applying transformer-based machine learning models for ranking search results.

Уровень техникиState of the art

[02] Веб-поиск представляет собой важную задачу, связанную с ежедневной обработкой миллиардов пользовательских запросов. Современные системы веб-поиска обычно ранжируют результаты поиска согласно их релевантности поисковому запросу и другим критериям. Определение релевантности результатов поиска запросу часто предусматривает применение алгоритмов машинного обучения, обученных использованию нескольких определенных вручную признаков для оценивания различных показателей релевантности. Такое определение релевантности может рассматриваться, по меньшей мере частично, как проблема понимания языка, поскольку релевантность документа поисковому запросу имеет по меньшей мере некоторое отношение к семантическому пониманию запроса и результатов поиска, даже в случаях, когда запрос и результаты не содержат общих слов или когда результаты представляют собой изображения, музыку или другие нетекстовые результаты.[02] Web search is a major task that involves processing billions of user queries every day. Modern web search engines typically rank search results according to their relevance to the search query and other criteria. Determining the relevance of search results to a query often involves the use of machine learning algorithms trained to use several manually defined features to evaluate various relevance scores. Such relevance determination can be viewed, at least in part, as a language understanding problem, since the relevance of a document to a search query has at least some bearing on the semantic understanding of the query and the search results, even in cases where the query and results do not share words or where the results are images, music, or other non-textual results.

[03] Недавние разработки в области нейронной обработки естественного языка включают в себя использование трансформерных моделей машинного обучения, как описано в статье Vaswani et al., «Attention Is All You Need», Advances in neural information processing systems, pages 5998-6008, 2017. Трансформер представляет собой модель глубокого обучения (т.е. искусственную нейронную сеть или другую модель машинного обучения, содержащую несколько слоев), в которой для назначения одним частям входных данных большей значимости, чем другим, используется механизм внимания. При обработке естественного языка механизм внимания используется с целью определения контекста для слов из входных данных, при этом одно и то же слово в разных контекстах может иметь различные значения. Трансформеры способны параллельно обрабатывать множество слов или токенов естественного языка, что позволяет использовать параллельное обучение.[03] Recent developments in neural natural language processing include the use of transformer machine learning models, as described in Vaswani et al., “Attention Is All You Need,” Advances in neural information processing systems, pages 5998–6008, 2017. A transformer is a deep learning model (i.e., an artificial neural network or other machine learning model containing multiple layers) that uses an attention mechanism to assign greater importance to some parts of the input data than others. In natural language processing, attention is used to assign context to words in the input data, where the same word may have different meanings in different contexts. Transformers are capable of processing many words or tokens of natural language in parallel, enabling parallel learning.

[04] На трансформерах основаны и другие достижения в области обработки естественного языка, включая заранее обучаемые системы, которые могут заранее обучаться с использованием большого набора данных, а затем «уточняться» для использования в конкретных приложениях. Примеры таких систем включают в себя модель «Представления двунаправленного кодера из трансформеров» (BERT, Bidirectional Encoder Representations from Transformers), описанную в работе Devlin et al., «BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding», Proceedings of NAACL-HLT 2019, pages 4171-4186, 2019, и заранее обучаемый генеративный трансформер (GPT, Generative Pre-trained Transformer), описанный в работе Radford et al., «Improving Language Understanding by Generative Pre-Training», 2018.[04] Transformers are also the basis for other advances in natural language processing, including pre-trained systems that can be trained in advance using a large dataset and then “refined” for use in specific applications. Examples of such systems include the Bidirectional Encoder Representations from Transformers (BERT) model described in Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Proceedings of NAACL-HLT 2019, pages 4171–4186, 2019, and the Generative Pre-trained Transformer (GPT) model described in Radford et al., “Improving Language Understanding by Generative Pre-Training,” 2018.

[05] Несмотря на обеспечение трансформерами значительных успехов в задачах обработки естественного языка, при их использовании на практике для ранжирования результатов поиска возможны некоторые затруднения. Например, многие большие наборы данных о релевантности поиска содержат нетекстовые данные, такие как информация о выбранных пользователем ссылках, которая может быть полезна при обучении модели ранжирования.[05] Although transformers have made significant advances in natural language processing tasks, there may be some challenges when using them in practice to rank search results. For example, many large search relevance datasets contain non-textual data, such as information about the links a user clicked, which can be useful in training a ranking model.

Раскрытие изобретенияDisclosure of invention

[06] Различные варианты осуществления настоящей технологии обеспечивают способы эффективного обучения трансформерных моделей на метаданных запроса и данных о релевантности поиска, таких как данные о «кликах», на этапе предварительного обучения. Затем эти модели могут уточняться с использованием меньших краудсорсинговых наборов данных о релевантности для применения при ранжировании результатов поиска. Описанная технология повышает эффективность систем, используемых для ранжирования результатов поиска, чтобы потенциально обслуживать десятки миллионов активных пользователей и обрабатывать тысячи запросов в секунду.[06] Various embodiments of the present technology provide methods for efficiently training transformer models on query metadata and search relevance data, such as click data, in a pre-training phase. These models can then be refined using smaller crowdsourced relevance data sets for use in ranking search results. The described technology improves the efficiency of systems used to rank search results to potentially serve tens of millions of active users and process thousands of queries per second.

[07] Согласно одному аспекту настоящей технологии реализован компьютерный способ обучения модели машинного обучения ранжированию цифровых объектов этапа использования, сформированных с применением поискового запроса этапа использования. Способ выполняется процессором и включает в себя получение процессором первого множества обучающих цифровых объектов, при этом объект из первого множества обучающих цифровых объектов связан с параметром прошлых пользовательских действий, указывающим на пользовательские действия прошлых пользователей с объектом из первого множества обучающих цифровых объектов. Способ на первом этапе обучения модели машинного обучения дополнительно включает в себя основанное на первом множестве обучающих цифровых объектов обучение определению параметра прогнозируемых пользовательских действий для цифрового объекта этапа использования, при этом параметр прогнозируемых пользовательских действий указывает на пользовательские действия будущих пользователей с цифровым объектом этапа использования. Способ также включает в себя получение процессором второго множества обучающих цифровых объектов, при этом объект из второго множества обучающих цифровых объектов связан (а) с обучающим поисковым запросом, используемым для формирования объекта из второго множества обучающих цифровых объектов, и (б) с первой сформированной оценщиком меткой, указывающей на степень релевантности объекта из второго множества обучающих цифровых объектов обучающему поисковому запросу с точки зрения оценщика-человека, назначившего первую сформированную оценщиком метку. Способ на втором этапе обучения, следующем за первым этапом обучения, также включает в себя основанное на втором множестве обучающих цифровых объектов обучение модели машинного обучения определению синтезированной метки оценщика для цифрового объекта этапа использования, указывающей на степень релевантности цифрового объекта этапа использования поисковому запросу этапа использования с точки зрения оценщика-человека в случае предоставления цифрового объекта этапа использования оценщику-человеку. Способ также включает в себя применение процессором модели машинного обучения в отношении первого множества обучающих цифровых объектов для дополнения объекта из первого множества обучающих цифровых объектов синтезированной меткой оценщика и формирования таким образом первого дополненного множества обучающих цифровых объектов. Способ также включает в себя основанное на первом дополненном множестве обучающих цифровых объектов обучение модели машинного обучения определению параметра релевантности цифрового объекта этапа использования, указывающего на степень релевантности цифрового объекта этапа использования поисковому запросу этапа использования.[07] According to one aspect of the present technology, a computer method for training a machine learning model to rank digital objects of the use stage generated using a search query of the use stage is implemented. The method is performed by a processor and includes receiving by the processor a first plurality of training digital objects, wherein an object from the first plurality of training digital objects is associated with a parameter of past user actions indicating user actions of past users with an object from the first plurality of training digital objects. The method at the first stage of training the machine learning model further includes training, based on the first plurality of training digital objects, to determine a parameter of predicted user actions for the digital object of the use stage, wherein the parameter of predicted user actions indicates user actions of future users with the digital object of the use stage. The method also includes receiving by the processor a second plurality of training digital objects, wherein the object from the second plurality of training digital objects is associated with (a) a training search query used to form the object from the second plurality of training digital objects, and (b) with a first label generated by the evaluator, indicating the degree of relevance of the object from the second plurality of training digital objects to the training search query from the point of view of the human evaluator who assigned the first label generated by the evaluator. The method at the second training stage following the first training stage also includes training a machine learning model based on the second plurality of training digital objects to determine a synthesized evaluator label for the digital object of the use stage, indicating the degree of relevance of the digital object of the use stage to the search query of the use stage from the point of view of the human evaluator in the case of providing the digital object of the use stage to the human evaluator. The method also includes the application by the processor of the machine learning model in relation to the first plurality of training digital objects to supplement the object from the first plurality of training digital objects with the synthesized label of the evaluator and thus forming the first supplemented plurality of training digital objects. The method also includes training the machine learning model based on the first supplemented plurality of training digital objects to determine the relevance parameter of the digital object of the use stage, indicating the degree of relevance of the digital object of the use stage to the search query of the use stage.

[08] В некоторых вариантах осуществления изобретения объект из первого множества обучающих цифровых объектов содержит указание на цифровой документ, связанный с метаданными документа. Основанное на первом множестве обучающих цифровых объектов обучение модели машинного обучения на первом этапе обучения дополнительно включает в себя: преобразование метаданных документа в их текстовое представление, содержащее токены; предварительную обработку текстового представления для маскирования в нем нескольких маскированных токенов; и основанное на первом множестве обучающих цифровых объектов обучение модели машинного обучения определению токена из нескольких маскированных токенов на основе контекста, обеспечиваемого соседними токенами. Параметр релевантности цифрового объекта этапа использования дополнительно указывает на параметр семантической релевантности, указывающий на степень семантической релевантности поискового запроса этапа использования контенту цифрового объекта этапа использования. В некоторых вариантах осуществления изобретения метаданные документа содержат по меньшей мере одно из следующего: обучающий поисковый запрос, связанный с объектом из первого множества обучающих цифровых объектов, заголовок цифрового документа, контент цифрового документа и веб-адрес, связанный с цифровым документом.[08] In some embodiments of the invention, an object from the first plurality of training digital objects comprises an indication of a digital document associated with the document metadata. Based on the first plurality of training digital objects, training a machine learning model in a first training stage further comprises: converting the document metadata into a text representation thereof comprising tokens; pre-processing the text representation to mask a plurality of masked tokens therein; and based on the first plurality of training digital objects, training the machine learning model to determine a token from the plurality of masked tokens based on the context provided by neighboring tokens. The relevance parameter of the usage stage digital object further indicates a semantic relevance parameter indicating the degree of semantic relevance of the usage stage search query to the content of the usage stage digital object. In some embodiments of the invention, the document metadata comprises at least one of the following: a training search query associated with an object from the first plurality of training digital objects, a title of the digital document, content of the digital document, and a web address associated with the digital document.

[09] В некоторых вариантах осуществления способа он дополнительно включает в себя определение параметра прошлых пользовательских действий, связанного с объектом из первого множества обучающих цифровых объектов, на основе данных о «кликах» (нажатиях, например, выборе результате поиска) прошлых пользователей. В некоторых вариантах осуществления изобретения данные о «кликах» содержат данные о по меньшей мере одном «клике» по меньшей мере одного прошлого пользователя, сделанном в ответ на отправку обучающего поискового запроса, связанного с объектом из первого множества обучающих цифровых объектов.[09] In some embodiments of the method, it further includes determining a parameter of past user actions associated with an object from the first plurality of training digital objects, based on data about "clicks" (presses, for example, selecting a search result) of past users. In some embodiments of the invention, the data about "clicks" comprises data about at least one "click" of at least one past user made in response to sending a training search query associated with an object from the first plurality of training digital objects.

[010] В некоторых вариантах осуществления способа он перед обучением модели машинного обучения определению параметра релевантности цифрового объекта этапа использования дополнительно включает в себя получение процессором третьего множества обучающих цифровых объектов, при этом объект из третьего множества обучающих цифровых объектов связан (а) с обучающим поисковым запросом, используемым для формирования объекта из третьего множества обучающих цифровых объектов, и (б) со второй сформированной оценщиком меткой, указывающей на степень релевантности объекта из третьего множества обучающих цифровых объектов обучающему поисковому запросу с точки зрения оценщика-человека, назначившего вторую сформированную оценщиком метку. В этих вариантах осуществления способ также включает в себя обучение на третьем этапе обучения, следующем за вторым этапом обучения, на основе третьего множества обучающих цифровых объектов модели машинного обучения определению уточненной синтезированной метки оценщика цифрового объекта этапа использования, указывающей на степень релевантности цифрового объекта этапа использования поисковому запросу этапа использования с точки зрения оценщика-человека в случае предоставления цифрового объекта этапа использования оценщику-человеку. Способ также включает в себя применение процессором модели машинного обучения в отношении первого дополненного множества обучающих цифровых объектов для дополнения объекта из первого дополненного множества обучающих цифровых объектов уточненной синтезированной меткой оценщика и формирования таким образом второго дополненного множества обучающих цифровых объектов. В этих вариантах осуществления способа он дополнительно включает в себя обучение модели машинного обучения определению параметра релевантности цифрового объекта этапа использования на основе второго дополненного множества обучающих цифровых объектов. В некоторых вариантах осуществления изобретения множество из первого множества обучающих цифровых объектов, второго множества обучающих цифровых объектов и третьего множества обучающих цифровых объектов по меньшей мере частично отличается от любого другого множества из первого множества обучающих цифровых объектов, второго множества обучающих цифровых объектов и третьего множества обучающих цифровых объектов. В некоторых вариантах осуществления изобретения множество из первого множества обучающих цифровых объектов, второго множества обучающих цифровых объектов и третьего множества обучающих цифровых объектов больше по размеру, чем последующее множество из первого множества обучающих цифровых объектов, второго множества обучающих цифровых объектов и третьего множества обучающих цифровых объектов.[010] In some embodiments of the method, before training the machine learning model to determine the relevance parameter of the digital object of the use stage, it further includes receiving by the processor a third plurality of training digital objects, wherein the object from the third plurality of training digital objects is associated with (a) a training search query used to generate the object from the third plurality of training digital objects, and (b) with a second label generated by the evaluator indicating the degree of relevance of the object from the third plurality of training digital objects to the training search query from the point of view of the human evaluator who assigned the second label generated by the evaluator. In these embodiments, the method also includes training, at a third training stage following the second training stage, on the basis of the third plurality of training digital objects, the machine learning model to determine a refined synthesized label of the evaluator of the digital object of the use stage indicating the degree of relevance of the digital object of the use stage to the search query of the use stage from the point of view of the human evaluator in the case of providing the digital object of the use stage to the human evaluator. The method also includes the processor applying a machine learning model to the first augmented plurality of training digital objects to augment an object from the first augmented plurality of training digital objects with the refined synthesized label of the evaluator and thus forming a second augmented plurality of training digital objects. In these embodiments of the method, it further includes training the machine learning model to determine the relevance parameter of the digital object of the use stage based on the second augmented plurality of training digital objects. In some embodiments of the invention, the plurality of the first plurality of training digital objects, the second plurality of training digital objects and the third plurality of training digital objects is at least partially different from any other plurality of the first plurality of training digital objects, the second plurality of training digital objects and the third plurality of training digital objects. In some embodiments of the invention, the plurality of the first plurality of training digital objects, the second plurality of training digital objects and the third plurality of training digital objects is larger in size than a subsequent plurality of the first plurality of training digital objects, the second plurality of training digital objects and the third plurality of training digital objects.

[011] В некоторых вариантах осуществления способа он после обучения модели машинного обучения определению параметра релевантности цифрового объекта этапа использования дополнительно включает в себя получение процессором третьего множества обучающих цифровых объектов, при этом объект из третьего множества обучающих цифровых объектов связан (а) с обучающим поисковым запросом, используемым для формирования объекта из третьего множества обучающих цифровых объектов, и (б) со второй сформированной оценщиком меткой, указывающей на степень релевантности объекта из третьего множества обучающих цифровых объектов обучающему поисковому запросу с точки зрения оценщика-человека, назначившего вторую сформированную оценщиком метку. Способ также включает в себя основанное на третьем множестве обучающих цифровых объектов обучение модели машинного обучения определению уточненного параметра релевантности цифрового объекта этапа использования, указывающего на степень релевантности цифрового объекта этапа использования поисковому запросу этапа использования. В некоторых вариантах осуществления изобретения множество из первого множества обучающих цифровых объектов, второго множества обучающих цифровых объектов и третьего множества обучающих цифровых объектов по меньшей мере частично отличается от любого другого множества из первого множества обучающих цифровых объектов, второго множества обучающих цифровых объектов и третьего множества обучающих цифровых объектов. В некоторых вариантах осуществления изобретения множество из первого множества обучающих цифровых объектов, второго множества обучающих цифровых объектов и третьего множества обучающих цифровых объектов больше по размеру, чем последующее множество из первого множества обучающих цифровых объектов, второго множества обучающих цифровых объектов и третьего множества обучающих цифровых объектов. В некоторых вариантах осуществления изобретения третье множество обучающих объектов идентично второму множеству обучающих цифровых объектов.[011] In some embodiments of the method, after training the machine learning model to determine the relevance parameter of the digital object of the use stage, it further includes receiving by the processor a third plurality of training digital objects, wherein the object from the third plurality of training digital objects is associated with (a) a training search query used to generate the object from the third plurality of training digital objects, and (b) with a second label generated by the evaluator indicating the degree of relevance of the object from the third plurality of training digital objects to the training search query from the point of view of the human evaluator who assigned the second label generated by the evaluator. The method also includes training the machine learning model based on the third plurality of training digital objects to determine a refined relevance parameter of the digital object of the use stage indicating the degree of relevance of the digital object of the use stage to the search query of the use stage. In some embodiments of the invention, the plurality of the first plurality of training digital objects, the second plurality of training digital objects, and the third plurality of training digital objects is at least partially different from any other plurality of the first plurality of training digital objects, the second plurality of training digital objects, and the third plurality of training digital objects. In some embodiments of the invention, the plurality of the first plurality of training digital objects, the second plurality of training digital objects, and the third plurality of training digital objects is larger in size than a subsequent plurality of the first plurality of training digital objects, the second plurality of training digital objects, and the third plurality of training digital objects. In some embodiments of the invention, the third plurality of training objects is identical to the second plurality of training digital objects.

[012] В некоторых вариантах осуществления изобретения на первом этапе обучения модель машинного обучения обучается определению грубой первоначальной оценки параметра релевантности цифрового объекта этапа использования. На каждом следующем этапе обучения модель машинного обучения обучается с целью улучшения грубой первоначальной оценки. В некоторых вариантах осуществления изобретения улучшение грубой первоначальной оценки определяется с использованием метрики на основе нормализованного дисконтированного кумулятивного показателя.[012] In some embodiments of the invention, in the first training stage, the machine learning model is trained to determine a rough initial estimate of the relevance parameter of the digital object of the use stage. In each subsequent training stage, the machine learning model is trained to improve the rough initial estimate. In some embodiments of the invention, the improvement of the rough initial estimate is determined using a metric based on a normalized discounted cumulative indicator.

[013] В некоторых вариантах осуществления изобретения модель машинного обучения содержит по меньшей мере одну модель обучения. В некоторых вариантах осуществления изобретения эта по меньшей мере одна модель обучения представляет собой модель обучения на основе трансформера.[013] In some embodiments of the invention, the machine learning model comprises at least one learning model. In some embodiments of the invention, the at least one learning model is a transformer-based learning model.

[014] В некоторых вариантах осуществления изобретения модель машинного обучения содержит по меньшей мере две модели обучения. Первая модель из двух моделей обучения обучается определению синтезированной метки оценщика для цифрового объекта этапа использования с целью формирования первого дополненного множества обучающих цифровых объектов. Вторая модель из двух моделей обучения обучается определению параметра релевантности цифрового объекта этапа использования на основе первого дополненного множества обучающих цифровых объектов. В некоторых вариантах осуществления изобретения первая модель из двух моделей обучения отличается от второй модели. В некоторых вариантах осуществления изобретения первая модель из двух моделей обучения представляет собой модель обучения на основе трансформера.[014] In some embodiments of the invention, the machine learning model comprises at least two learning models. The first model of the two learning models is trained to determine the synthesized evaluator label for the digital object of the use stage in order to form a first augmented set of training digital objects. The second model of the two learning models is trained to determine the relevance parameter of the digital object of the use stage based on the first augmented set of training digital objects. In some embodiments of the invention, the first model of the two learning models differs from the second model. In some embodiments of the invention, the first model of the two learning models is a transformer-based learning model.

[015] В некоторых вариантах осуществления способа он дополнительно включает в себя ранжирование цифровых объектов этапа использования по связанным с ними параметрам релевантности. В некоторых вариантах осуществления способа он дополнительно включает в себя ранжирование цифровых объектов этапа использования на основе связанных с ними параметров релевантности, включающее в себя использование другой модели обучения, обученной ранжированию цифровых объектов этапа использования с применением в качестве входных признаков параметров релевантности, сформированных моделью машинного обучения. В некоторых вариантах осуществления изобретения другая модель обучения представляет собой модель обучения на основе деревьев решений CatBoost.[015] In some embodiments of the method, it further includes ranking the digital objects of the use stage based on the relevance parameters associated therewith. In some embodiments of the method, it further includes ranking the digital objects of the use stage based on the relevance parameters associated therewith, including using another learning model trained to rank the digital objects of the use stage using the relevance parameters generated by the machine learning model as input features. In some embodiments of the invention, the other learning model is a learning model based on CatBoost decision trees.

[016] Согласно другому аспекту настоящей технологии реализована система для обучения модели машинного обучения ранжированию цифровых объектов этапа использования, сформированных с применением поискового запроса этапа использования. Система содержит процессор и память, связанную с процессором и содержащую обучающий модуль машинного обучения, выполняемый процессором. Обучающий модуль машинного обучения содержит команды, при исполнении которых процессор выполняет следующие действия: получение первого множества обучающих цифровых объектов, при этом объект из первого множества обучающих цифровых объектов связан с параметром прошлых пользовательских действий, указывающим на пользовательские действия прошлых пользователей с объектом из первого множества обучающих цифровых объектов; на первом этапе обучения - основанное на первом множестве обучающих цифровых объектов обучение модели машинного обучения определению параметра прогнозируемых пользовательских действий для цифрового объекта этапа использования, указывающего на пользовательские действия будущих пользователей с цифровым объектом этапа использования; получение второго множества обучающих цифровых объектов, при этом объект из второго множества обучающих цифровых объектов связан (а) с обучающим поисковым запросом, используемым для формирования объекта из второго множества обучающих цифровых объектов, и (б) с первой сформированной оценщиком меткой, указывающей на степень релевантности объекта из второго множества обучающих цифровых объектов обучающему поисковому запросу с точки зрения оценщика-человека, назначившего первую сформированную оценщиком метку; на втором этапе обучения, следующем за первым этапом обучения, - основанное на втором множестве обучающих цифровых объектов обучение модели машинного обучения определению синтезированной метки оценщика цифрового объекта этапа использования, указывающей на степень релевантности цифрового объекта этапа использования поисковому запросу этапа использования с точки зрения оценщика-человека в случае предоставления цифрового объекта этапа использования оценщику-человеку; применение модели машинного обучения в отношении первого множества обучающих цифровых объектов для дополнения объекта из первого множества обучающих цифровых объектов синтезированной меткой оценщика и формирования таким образом первого дополненного множества обучающих цифровых объектов; и основанное на первом дополненном множестве обучающих цифровых объектов обучение модели машинного обучения определению параметра релевантности цифрового объекта этапа использования, указывающего на степень релевантности цифрового объекта этапа использования поисковому запросу этапа использования.[016] According to another aspect of the present technology, a system is implemented for training a machine learning model to rank use-stage digital objects generated using a use-stage search query. The system comprises a processor and memory associated with the processor and containing a machine learning training module executed by the processor. The machine learning training module comprises instructions, when executed, the processor performs the following actions: obtaining a first plurality of training digital objects, wherein an object from the first plurality of training digital objects is associated with a past user action parameter indicating user actions of past users with an object from the first plurality of training digital objects; in a first training stage, training the machine learning model based on the first plurality of training digital objects to determine a predicted user action parameter for a use-stage digital object indicating user actions of future users with the use-stage digital object; obtaining a second plurality of training digital objects, wherein an object from the second plurality of training digital objects is associated with (a) a training search query used to generate an object from the second plurality of training digital objects, and (b) with a first label generated by the evaluator indicating the degree of relevance of an object from the second plurality of training digital objects to the training search query from the point of view of a human evaluator who assigned the first label generated by the evaluator; in a second training stage following the first training stage - training a machine learning model based on the second plurality of training digital objects to determine a synthesized label of the evaluator of the use-stage digital object indicating the degree of relevance of the digital object of the use-stage to the search query of the use-stage from the point of view of the human evaluator in the case of providing the digital object of the use-stage to the human evaluator; applying the machine learning model with respect to the first plurality of training digital objects to supplement the object from the first plurality of training digital objects with the synthesized label of the evaluator and thereby forming the first supplemented set of training digital objects; and, based on the first augmented set of training digital objects, training a machine learning model to determine the relevance parameter of the digital object of the use stage, indicating the degree of relevance of the digital object of the use stage to the search query of the use stage.

[017] В некоторых вариантах осуществления изобретения объект из первого множества обучающих цифровых объектов содержит указание на цифровой документ, связанный с метаданными документа. Обучающий модуль машинного обучения также содержит команды, при исполнении которых процессор обучает модель машинного обучения на первом этапе обучения на основе первого множества обучающих цифровых объектов путем: преобразования метаданных документа в их текстовое представление, содержащее токены; предварительной обработки текстового представления для маскирования в нем нескольких маскированных токенов; и основанного на первом множестве обучающих цифровых объектов обучения модели машинного обучения определению токена из нескольких маскированных токенов на основе контекста, обеспечиваемого соседними токенами. В этих вариантах осуществления изобретения параметр релевантности цифрового объекта этапа использования дополнительно указывает на параметр семантической релевантности, указывающий на степень семантической релевантности поискового запроса этапа использования контенту цифрового объекта этапа использования.[017] In some embodiments of the invention, an object from the first plurality of training digital objects comprises an indication of a digital document associated with document metadata. The machine learning training module also comprises instructions, when executed, for training a machine learning model in a first training stage based on the first plurality of training digital objects by: converting the document metadata into a text representation thereof comprising tokens; pre-processing the text representation to mask a plurality of masked tokens therein; and, based on the first plurality of training digital objects, training the machine learning model to determine a token from a plurality of masked tokens based on the context provided by neighboring tokens. In these embodiments of the invention, the relevance parameter of the use stage digital object further indicates a semantic relevance parameter indicating the degree of semantic relevance of the use stage search query to the content of the use stage digital object.

[018] В некоторых вариантах осуществления изобретения обучающий модуль машинного обучения дополнительно содержит команды, при исполнении которых процессор перед обучением модели машинного обучения определению параметра релевантности цифрового объекта этапа использования выполняет следующие действия: получение третьего множества обучающих цифровых объектов, при этом объект из третьего множества обучающих цифровых объектов связан (а) с обучающим поисковым запросом, используемым для формирования объекта из третьего множества обучающих цифровых объектов, и (б) со второй сформированной оценщиком меткой, указывающей на степень релевантности объекта из третьего множества обучающих цифровых объектов обучающему поисковому запросу с точки зрения оценщика-человека, назначившего вторую сформированную оценщиком метку; обучение на третьем этапе обучения, следующем за вторым этапом обучения, на основе третьего множества обучающих цифровых объектов модели машинного обучения определению уточненной синтезированной метки оценщика цифрового объекта этапа использования, указывающей на степень релевантности цифрового объекта этапа использования поисковому запросу этапа использования с точки зрения оценщика-человека в случае предоставления цифрового объекта этапа использования оценщику-человеку; применение модели машинного обучения в отношении первого дополненного множества обучающих цифровых объектов для дополнения объекта из первого дополненного множества обучающих цифровых объектов уточненной синтезированной меткой оценщика и формирования таким образом второго дополненного множества обучающих цифровых объектов; и основанное на втором дополненном множестве обучающих цифровых объектов обучение модели машинного обучения определению параметра релевантности цифрового объекта этапа использования.[018] In some embodiments of the invention, the machine learning training module further comprises instructions that, when executed, the processor, prior to training the machine learning model to determine the relevance parameter of the use-stage digital object, performs the following actions: obtaining a third plurality of training digital objects, wherein an object from the third plurality of training digital objects is associated with (a) a training search query used to generate an object from the third plurality of training digital objects, and (b) with a second evaluator-generated label indicating the degree of relevance of an object from the third plurality of training digital objects to the training search query from the point of view of a human evaluator who assigned the second evaluator-generated label; training, in a third training stage following the second training stage, based on the third plurality of training digital objects, the machine learning model to determine a refined synthesized evaluator label of the use-stage digital object indicating the degree of relevance of the use-stage digital object to the use-stage search query from the point of view of the human evaluator in the case of providing the use-stage digital object to the human evaluator; applying a machine learning model to a first augmented set of training digital objects to augment an object from the first augmented set of training digital objects with a refined synthesized label of the evaluator and thereby forming a second augmented set of training digital objects; and training the machine learning model, based on the second augmented set of training digital objects, to determine the relevance parameter of the digital object of the use stage.

Краткое описание чертежейBrief description of the drawings

[019] Эти и другие признаки, аспекты и преимущества настоящей технологии поясняются в дальнейшем описании, в приложенной формуле изобретения и на следующих чертежах.[019] These and other features, aspects and advantages of the present technology are explained in the following description, in the appended claims and in the following drawings.

[020] На фиг. 1 представлена схема примера компьютерной системы для использования в некоторых вариантах осуществления систем и/или способов согласно настоящей технологии.[020] Fig. 1 is a diagram of an example computer system for use in some embodiments of systems and/or methods according to the present technology.

[021] На фиг. 2 представлена блок-схема архитектуры модели машинного обучения согласно различным вариантам осуществления настоящей технологии.[021] Fig. 2 is a block diagram of the architecture of a machine learning model according to various embodiments of the present technology.

[022] На фиг. 3 представлена структура наборов данных, которые могут быть использованы для предварительного обучения и точной настройки модели машинного обучения, предназначенной для применения при ранжировании результатов поиска согласно различным вариантам осуществления настоящей технологии.[022] Fig. 3 illustrates a structure of data sets that may be used to pre-train and fine-tune a machine learning model for use in ranking search results according to various embodiments of the present technology.

[023] На фиг. 4 представлена блок-схема этапов предварительного обучения и точной настройки, выполняемых для обучения модели машинного обучения формированию оценок релевантности, согласно различным вариантам осуществления настоящей технологии.[023] Fig. 4 is a flow chart of the pre-training and fine-tuning steps performed to train a machine learning model to generate relevance scores, according to various embodiments of the present technology.

[024] На фиг. 5 представлена блок-схема компьютерного способа обучения модели машинного обучения согласно различным вариантам осуществления настоящей технологии.[024] Fig. 5 is a block diagram of a computer method for training a machine learning model according to various embodiments of the present technology.

[025] На фиг. 6 представлена блок-схема использования полностью обученной модели машинного для ранжирования результатов поиска согласно различным вариантам осуществления настоящей технологии.[025] Fig. 6 is a block diagram of the use of a fully trained machine learning model for ranking search results according to various embodiments of the present technology.

Осуществление изобретенияImplementation of the invention

[026] Различные типовые варианты осуществления настоящей технологии более полно описаны ниже с ссылкой на приложенные чертежи. Тем не менее, настоящая технология может быть реализована во многих различных формах и не должна рассматриваться как ограниченная описанными здесь типовыми вариантами осуществления. Абсолютные и относительные размеры слоев и областей могут быть увеличенными на чертежах для ясности. Одинаковые числовые обозначения везде относятся к одинаковым элементам.[026] Various exemplary embodiments of the present technology are described more fully below with reference to the accompanying drawings. However, the present technology can be implemented in many different forms and should not be considered as limited to the exemplary embodiments described herein. The absolute and relative sizes of layers and regions may be exaggerated in the drawings for clarity. The same numerals refer to the same elements throughout.

[027] Представленные здесь примеры и условный язык предназначены для обеспечения лучшего понимания принципов настоящей технологии, а не для ограничения ее объема до таких специально приведенных примеров и условий. Очевидно, что специалисты в данной области техники способны разработать различные способы и устройства, которые явно не описаны и не показаны, но реализуют принципы настоящей технологии в пределах ее существа и объема.[027] The examples and conventional language provided herein are intended to provide a better understanding of the principles of the present technology, and not to limit its scope to such specifically provided examples and conditions. It is obvious that those skilled in the art are able to devise various methods and devices that are not explicitly described or shown, but implement the principles of the present technology within its spirit and scope.

[028] Чтобы способствовать лучшему пониманию, последующее описание может содержать упрощенные варианты реализации настоящей технологии. Специалистам в данной области должно быть понятно, что другие варианты осуществления настоящей технологии могут быть значительно сложнее.[028] To facilitate better understanding, the following description may contain simplified embodiments of the present technology. Those skilled in the art will understand that other embodiments of the present technology may be significantly more complex.

[029] В некоторых случаях приводятся полезные примеры модификаций настоящей технологии. Они способствуют пониманию, но также не определяют объема или границ настоящей технологии. Представленный перечень модификаций не является исчерпывающим и специалист в данной области может разработать другие модификации в пределах объема настоящей технологии. Кроме того, если в некоторых случаях модификации не описаны, это не означает, что они невозможны и/или что описание содержит единственно возможный вариант реализации того или иного элемента настоящей технологии.[029] In some cases, useful examples of modifications of the present technology are given. They facilitate understanding, but also do not define the scope or boundaries of the present technology. The list of modifications provided is not exhaustive and a person skilled in the art can develop other modifications within the scope of the present technology. In addition, if in some cases modifications are not described, this does not mean that they are impossible and/or that the description contains the only possible embodiment of a particular element of the present technology.

[030] Очевидно, что несмотря на использование здесь числительных «первый», «второй», «третий» и т.д. для описания различных элементов, эти элементы не должны ограничиваться такими числительными. Такие числительные используются лишь для указания различия между элементами. Таким образом, первый элемент, обсуждаемый ниже, можно назвать вторым элементом без выхода за границы настоящей технологии. В контексте данного документа термин «и/или» соответствует любому элементу и всем сочетаниям элементов из соответствующих перечисленных элементов.[030] It is obvious that although the numerals "first", "second", "third", etc. are used herein to describe various elements, these elements should not be limited by such numerals. Such numerals are used only to indicate the difference between the elements. Thus, the first element discussed below can be called the second element without going beyond the scope of the present technology. In the context of this document, the term "and/or" corresponds to any element and all combinations of elements from the corresponding listed elements.

[031] Должно быть понятно, что при указании на соединение или связь элемента с другим элементом он может быть соединен или связан с другим элементом непосредственно либо при этом могут присутствовать промежуточные элементы. Если указано, что элемент непосредственно соединен или непосредственно связан с другим элементом, то промежуточные элементы отсутствуют. Другие слова, используемые для описания взаимосвязи между элементами, следует понимать аналогичным образом (например, «между» и «непосредственно между», «примыкающий» и «непосредственно примыкающий» и т.д.).[031] It should be understood that when an element is said to be connected or related to another element, it may be connected or related to the other element directly, or there may be intermediate elements. If an element is said to be directly connected or related to another element, there are no intermediate elements. Other words used to describe the relationship between elements should be understood in a similar way (e.g., "between" and "directly between", "adjacent" and "directly adjacent", etc.).

[032] Используемая здесь терминология предназначена лишь для описания конкретных типовых вариантов осуществления, но не для ограничения объема охраны настоящей технологии. Используемые здесь слова в единственном числе также подразумевают слова во множественном числе, если контекст явно не указывает на иное. Должно быть понятно, что используемые здесь термины «содержит» и/или «содержащий» соответствуют наличию указанных признаков, частей, шагов, операций, элементов и/или компонентов, но не исключают наличия или добавления одного или нескольких других признаков, частей, шагов, операций, элементов, компонентов и/или их групп.[032] The terminology used herein is intended to describe specific exemplary embodiments only, but not to limit the scope of protection of the present technology. The singular words used herein also include the plural words, unless the context clearly indicates otherwise. It should be understood that the terms "comprises" and/or "comprising" as used herein correspond to the presence of the stated features, parts, steps, operations, elements and/or components, but do not exclude the presence or addition of one or more other features, parts, steps, operations, elements, components and/or groups thereof.

[033] Функции различных элементов, показанных на чертежах, включая любой функциональный блок, обозначенный как «процессор», могут быть реализованы с использованием специализированных аппаратных средств, а также аппаратных средств, способных выполнять программное обеспечение. Если используется процессор, эти функции могут выполняться одним выделенным процессором, одним совместно используемым процессором или множеством отдельных процессоров, некоторые из которых могут использоваться совместно. В некоторых вариантах осуществления настоящей технологии процессор может представлять собой процессор общего назначения, такой как центральный процессор (CPU), или специализированный процессор, такой как цифровой сигнальный процессор (DSP). Кроме того, явное использование термина «процессор» не должно трактоваться как указание исключительно на аппаратные средства, способные выполнять программное обеспечение, и может, помимо прочего, подразумевать специализированную интегральную схему (ASIC), программируемую вентильную матрицу (FPGA), постоянное запоминающее устройство (ПЗУ) для хранения программного обеспечения, оперативное запоминающее устройство (ОЗУ) и энергонезависимое запоминающее устройство. Также могут подразумеваться другие аппаратные средства, общего назначения и/или заказные.[033] The functions of the various elements shown in the drawings, including any functional unit designated as a "processor", may be implemented using specialized hardware as well as hardware capable of executing software. If a processor is used, these functions may be performed by a single dedicated processor, a single shared processor, or a plurality of separate processors, some of which may be shared. In some embodiments of the present technology, the processor may be a general-purpose processor, such as a central processing unit (CPU), or a specialized processor, such as a digital signal processor (DSP). Furthermore, the explicit use of the term "processor" should not be construed as referring exclusively to hardware capable of executing software, and may include, but is not limited to, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a read-only memory (ROM) for storing software, a random access memory (RAM), and a non-volatile memory. Other hardware, general-purpose and/or custom, may also be included.

[034] Программные модули либо просто модули или блоки, реализация которых предполагается в виде программных средств, могут быть представлены здесь как любое сочетание элементов блок-схемы или других элементов, указывающих на выполнение шагов процесса и/или содержащих текстовое описание. Такие модули могут выполняться аппаратными средствами, показанными явно или подразумеваемыми. Кроме того, должно быть понятно, что модуль, помимо прочего, может, например, содержать обеспечивающие требуемые возможности компьютерную программную логику, компьютерные программные команды, прикладное программное обеспечение, стек, встроенное программное обеспечение, схемотехнику аппаратных средств либо их сочетание.[034] Software modules or simply modules or blocks that are intended to be implemented as software may be represented here as any combination of flow chart elements or other elements indicating the execution of process steps and/or containing a textual description. Such modules may be executed by hardware, shown explicitly or implied. In addition, it should be understood that a module may, among other things, contain, for example, computer software logic, computer software instructions, application software, a stack, firmware, hardware circuitry, or a combination thereof that provide the required capabilities.

[035] В контексте настоящего описания термин «база данных» означает любой структурированный набор данных, независимо от его конкретной структуры, программного обеспечения для управления базой данных или компьютерных аппаратных средств для хранения этих данных, их применения или обеспечения их использования иным способом. База данных может располагаться в тех же аппаратных средствах, где реализован процесс, обеспечивающий хранение или использование информации, хранящейся в базе данных, либо база данных может располагаться в отдельных аппаратных средствах, таких как специализированный сервер или множество серверов.[035] As used herein, the term "database" means any structured collection of data, regardless of its specific structure, database management software, or computer hardware for storing, using, or otherwise enabling the use of that data. A database may be located on the same hardware as the process that stores or uses the information stored in the database, or the database may be located on separate hardware, such as a dedicated server or multiple servers.

[036] Настоящая технология может быть реализована в виде системы, способа и/или компьютерного продукта. Компьютерный программный продукт может содержать машиночитаемый носитель информации (или несколько носителей), хранящий машиночитаемые программные команды, при исполнении которых процессор обеспечивает реализацию аспектов настоящей технологии. Машиночитаемый носитель информации может, например, представлять собой электронное запоминающее устройство, магнитное запоминающее устройство, оптическое запоминающее устройство, электромагнитное запоминающее устройство, полупроводниковое запоминающее устройство или любое подходящее их сочетание. Неполный перечень более конкретных примеров машиночитаемого носителя информации включает в себя портативный компьютерный диск, жесткий диск, ОЗУ, ПЗУ, флэш-память, оптический диск, карту памяти, гибкий диск, носитель с механическим или визуальным кодированием (например, перфокарту или штрих-код) и/или любое их сочетание. В контексте данного документа машиночитаемый носитель информации должен рассматриваться как машиночитаемый физический носитель информации. Он не должен рассматриваться как изменяющийся сигнал, такой как радиоволны или другие свободно распространяющиеся электромагнитные волны, электромагнитные волны, распространяющиеся через волновод или другую среду передачи (например, световые импульсы, проходящие через волоконно-оптический кабель), или электрические сигналы, передаваемые по проводам.[036] The present technology may be implemented as a system, a method, and/or a computer product. The computer program product may comprise a computer-readable storage medium (or several storage media) storing computer-readable program instructions that, when executed, cause a processor to implement aspects of the present technology. The computer-readable storage medium may, for example, be an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof. A non-exhaustive list of more specific examples of a computer-readable storage medium includes a portable computer disk, a hard disk, RAM, ROM, flash memory, an optical disk, a memory card, a floppy disk, a medium with mechanical or visual encoding (e.g., a punch card or a bar code), and/or any combination thereof. In the context of this document, a computer-readable storage medium shall be considered as a computer-readable physical storage medium. It should not be considered as a changing signal such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g. light pulses passing through a fiber optic cable), or electrical signals transmitted over wires.

[037] Должно быть понятно, что машиночитаемые программные команды могут быть загружены из машиночитаемого носителя информации в соответствующие вычислительные или обрабатывающие устройства либо во внешний компьютер или внешнее запоминающее устройство через сеть, например, через сеть Интернет, локальную сеть, глобальную сеть и/или беспроводную сеть. Сетевой интерфейс в вычислительном или обрабатывающем устройстве может получать машиночитаемые программные команды через сеть и пересылать машиночитаемые программные команды для хранения в машиночитаемом носителе информации в соответствующем вычислительном или обрабатывающем устройстве.[037] It should be understood that the machine-readable program instructions may be loaded from a machine-readable storage medium into the corresponding computing or processing devices or into an external computer or external storage device via a network, such as via the Internet, a local area network, a wide area network and/or a wireless network. A network interface in the computing or processing device may receive the machine-readable program instructions via the network and forward the machine-readable program instructions for storage in a machine-readable storage medium in the corresponding computing or processing device.

[038] Машиночитаемые программные команды для выполнения операций согласно настоящей технологии могут представлять собой команды ассемблера, машинные команды, команды встроенного программного обеспечения, данные конфигурации для интегральных схем либо другой исходный код или объектный код, написанный с использованием любого сочетания языков программирования. Машиночитаемые программные команды могут исполняться полностью на компьютере пользователя, частично на компьютере пользователя, как отдельный пакет программного обеспечения, частично на компьютере пользователя и частично на удаленном компьютере или полностью на удаленном компьютере либо сервере. В последнем случае удаленный компьютер может быть связан с компьютером пользователя через сеть любого вида.[038] Machine-readable program instructions for performing operations according to the present technology may be assembler instructions, machine instructions, firmware instructions, configuration data for integrated circuits, or other source code or object code written using any combination of programming languages. Machine-readable program instructions may be executed entirely on the user's computer, partially on the user's computer as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the latter case, the remote computer may be connected to the user's computer via any type of network.

[039] Описание принципов, аспектов и вариантов реализации настоящей технологии, а также их конкретные примеры предназначены для охвата их структурных и функциональных эквивалентов, независимо от того, известны они в настоящее время или будут разработаны в будущем. Например, специалистам в данной области техники должно быть понятно, что любые описанные здесь структурные схемы соответствуют концептуальным представлениям иллюстративных принципиальных схем, реализующих основы настоящей технологии. Также должно быть понятно, что любые блок-схемы, схемы процессов, диаграммы изменения состояния, псевдокоды и т.п. соответствуют различным процессам, которые могут быть представлены в машиночитаемых программных командах. Эти машиночитаемые программные команды могут быть предоставлены процессору или другому программируемому устройству обработки данных для формирования машины так, чтобы команды, исполняемые процессором компьютера или другим программируемым устройством обработки данных, создавали средства для реализации функций или действий, указанных в блок-схеме, и/или блока либо блоков блок-схемы. Эти машиночитаемые программные команды также могут быть сохранены в машиночитаемом носителе информации, который может предписывать компьютеру, программируемому устройству обработки данных и/или другим устройствам функционировать особым образом так, чтобы машиночитаемый носитель информации с хранящимися в нем командами содержал изделие, содержащее команды, реализующее аспекты функций или действий, указанных в блок-схемах, схемах процессов, диаграммах изменения состояния, псевдокодах и т.п.[039] The description of the principles, aspects and embodiments of the present technology, as well as specific examples thereof, are intended to cover their structural and functional equivalents, regardless of whether they are currently known or will be developed in the future. For example, those skilled in the art should understand that any block diagrams described herein correspond to conceptual representations of illustrative circuit diagrams implementing the principles of the present technology. It should also be understood that any block diagrams, process diagrams, state transition diagrams, pseudocodes and the like correspond to various processes that can be represented in computer-readable software instructions. These computer-readable software instructions can be provided to a processor or other programmable data processing device to form a machine so that the instructions, when executed by the computer processor or other programmable data processing device, create means for implementing the functions or actions indicated in the block diagram and/or a block or blocks of the block diagram. These machine-readable program instructions may also be stored in a machine-readable storage medium that may cause a computer, programmable data processing device, and/or other devices to operate in a specific manner so that the machine-readable storage medium with the instructions stored therein comprises an article containing instructions implementing aspects of the functions or actions indicated in the flow charts, process diagrams, state transition diagrams, pseudocodes, etc.

[040] Эти машиночитаемые программные команды также могут быть загружены в компьютер, иное программируемое устройство обработки данных или другие устройства, чтобы инициировать последовательность рабочих шагов, подлежащих выполнению в компьютере, ином программируемом устройстве или других устройствах, для формирования компьютерного процесса так, чтобы команды, исполняемые в компьютере, ином программируемом устройстве или других устройствах, реализовывали функции или действия, указанные в блок-схемах, схемах процессов, диаграммах изменения состояния, псевдокодах и т.п.[040] These computer-readable program instructions may also be loaded into a computer, other programmable data processing device, or other devices to initiate a sequence of operational steps to be performed in the computer, other programmable device, or other devices to form a computer process so that the instructions executed in the computer, other programmable device, or other devices implement the functions or actions specified in flow charts, process diagrams, state transition diagrams, pseudocodes, etc.

[041] В некоторых альтернативных вариантах осуществления изобретения функции, указанные в блок-схемах, схемах процессов, диаграммах изменения состояния, псевдокодах и т.п., могут реализовываться в порядке, отличном от указанного на чертежах. Например, два блока, представленные на блок-схеме как последовательные, фактически могут выполняться одновременно или эти блоки иногда могут выполняться в обратном порядке - в зависимости от реализуемой функции. Также следует отметить, что каждая функция, указанная на чертежах, и сочетания таких функций могут быть реализованы системами на основе специализированных аппаратных средств, выполняющими указанные функции или действия, либо сочетаниями специализированных аппаратных средств и компьютерных команд.[041] In some alternative embodiments of the invention, the functions indicated in flow charts, process diagrams, state transition diagrams, pseudocodes, and the like may be implemented in an order different from that indicated in the drawings. For example, two blocks shown in a flow chart as sequential may actually be performed simultaneously, or these blocks may sometimes be performed in the opposite order - depending on the function being implemented. It should also be noted that each function indicated in the drawings, and combinations of such functions, may be implemented by systems based on specialized hardware that perform the indicated functions or actions, or by combinations of specialized hardware and computer instructions.

[042] Далее с учетом вышеизложенных принципов рассмотрены некоторые не имеющие ограничительного характера примеры, иллюстрирующие различные варианты реализации аспектов настоящей технологии.[042] Next, taking into account the above principles, some non-limiting examples are considered, illustrating various options for implementing aspects of this technology.

Компьютерная системаComputer system

[043] На фиг. 1 представлена компьютерная система 100. Компьютерная система 100 может представлять собой многопользовательский компьютер, однопользовательский компьютер, ноутбук, планшетный компьютер, смартфон, встроенную систему управления или любую другую компьютерную систему, которая известна в настоящее время или будет разработана в будущем. Кроме того, следует понимать, что некоторые или все элементы компьютерной системы 100 могут быть виртуализированы и/или основаны на облачных вычислениях. Как показано на фиг. 1, компьютерная система 100 содержит один или несколько процессоров 102, память 110, интерфейс 120 хранилища данных и сетевой интерфейс 140. Эти элементы системы взаимосвязаны через шину 150, которая может содержать одну или несколько внутренних и/или внешних шин (не показаны) (таких как шина PCI, шина USB, шина FireWire стандарта IEEE 1394, шина SCSI, шина Serial-ATA и т.д.), с которыми различные аппаратные элементы соединены электронным образом.[043] Fig. 1 shows a computer system 100. The computer system 100 may be a multi-user computer, a single-user computer, a laptop, a tablet computer, a smartphone, an embedded management system, or any other computer system that is currently known or will be developed in the future. In addition, it should be understood that some or all of the elements of the computer system 100 may be virtualized and/or cloud-based. As shown in Fig. 1, the computer system 100 includes one or more processors 102, memory 110, a data storage interface 120, and a network interface 140. These system elements are interconnected via a bus 150, which may include one or more internal and/or external buses (not shown) (such as a PCI bus, a USB bus, an IEEE 1394 FireWire bus, a SCSI bus, a Serial-ATA bus, etc.), to which various hardware elements are electronically connected.

[044] Память 110, которая может представлять собой ОЗУ или память любого другого вида, может содержать данные 112, операционную систему 114 и программу 116. Данные 112 могут представлять собой любые данные, соответствующие входным или выходным данным любой программы в компьютерной системе 100. Операционная система 114 представляет собой операционную систему, такую как MICROSOFT WINDOWS или LINUX. Программа 116 может представлять собой любую программу или набор программ, содержащих программные команды, которые могут исполняться процессором для управления действиями, выполняемыми компьютерной системой 100. Например, программа 116 может представлять собой обучающий модуль машинного обучения, выполняющий обучение модели машинного обучения, как описано ниже. Программа 116 также может представлять собой систему, использующую обученную модель машинного обучения для ранжирования результатов поиска, как описано ниже.[044] Memory 110, which may be RAM or any other type of memory, may contain data 112, an operating system 114, and a program 116. Data 112 may be any data corresponding to input or output data of any program in computer system 100. Operating system 114 is an operating system such as MICROSOFT WINDOWS or LINUX. Program 116 may be any program or set of programs containing program instructions that can be executed by a processor to control actions performed by computer system 100. For example, program 116 may be a machine learning training module that trains a machine learning model, as described below. Program 116 may also be a system that uses a trained machine learning model to rank search results, as described below.

[045] Интерфейс 120 хранилища данных используется для подключения запоминающих устройств, таких как запоминающее устройство 125, к компьютерной системе 100. Запоминающее устройство 125 одного вида представляет собой твердотельный накопитель, в котором для постоянного хранения данных может использоваться блок интегральных схем. Запоминающее устройство 125 другого вида представляет собой накопитель на жестких дисках, такой как электромеханическое устройство, использующее магнитное запоминающее устройство для хранения и извлечения цифровых данных. Запоминающее устройство 125 также может представлять собой накопитель на оптических дисках, устройство для считывания карт памяти, таких как SD-карта, или устройство флэш-памяти, которое может быть подключено к компьютерной системе 100, например, через универсальную последовательную шину (USB).[045] The data storage interface 120 is used to connect storage devices, such as the storage device 125, to the computer system 100. The storage device 125 of one type is a solid-state drive, in which an integrated circuit block may be used for persistent storage of data. The storage device 125 of another type is a hard disk drive, such as an electromechanical device that uses a magnetic storage device to store and retrieve digital data. The storage device 125 may also be an optical disk drive, a memory card reader, such as an SD card, or a flash memory device, which can be connected to the computer system 100, for example, via a universal serial bus (USB).

[046] В некоторых вариантах реализации изобретения в компьютерной системе 100 могут применяться хорошо известные механизмы виртуальной памяти, позволяющие программам компьютерной системы 100 работать так, как если бы они имели доступ к большому непрерывному адресному пространству, а не обращались к нескольким областям памяти меньшего размера, таким как память 110 и запоминающее устройство 125. Таким образом, несмотря на то, что данные 112, операционная система 114 и программы 116 показаны как размещенные в памяти 110, специалисту в данной области техники должно быть понятно, что эти элементы не обязательно должны одновременно полностью содержаться в памяти 110.[046] In some embodiments of the invention, computer system 100 may employ well-known virtual memory mechanisms to allow programs of computer system 100 to operate as if they had access to a large contiguous address space, rather than accessing multiple smaller memory areas such as memory 110 and storage device 125. Thus, although data 112, operating system 114, and programs 116 are shown as being located in memory 110, one skilled in the art will appreciate that these elements need not be contained entirely within memory 110 at any one time.

[047] Процессоры 102 могут содержать один или несколько микропроцессоров и/или других интегральных схем. Процессоры 102 исполняют программные команды, хранящиеся в памяти 110. При запуске компьютерной системы 100 процессоры 102 могут сначала выполнять процедуру загрузки и/или исполнять программные команды, формирующие операционную систему 114.[047] The processors 102 may comprise one or more microprocessors and/or other integrated circuits. The processors 102 execute software instructions stored in the memory 110. When the computer system 100 is started, the processors 102 may first perform a boot procedure and/or execute software instructions that form the operating system 114.

[048] Сетевой интерфейс 140 используется для подключения компьютерной системы 100 к другим компьютерным системам или сетевым устройствам (не показаны) через сеть 160. Сетевой интерфейс 140 может содержать сочетание аппаратных и программных средств, обеспечивающих связь через сеть 160. В некоторых вариантах реализации изобретения сетевой интерфейс 140 может представлять собой беспроводной сетевой интерфейс. Программное обеспечение сетевого интерфейса 140 может содержать программное обеспечение, использующее один или несколько сетевых протоколов для связи через сеть 160. Например, сетевые протоколы могут включать в себя протокол управления передачей/интернет-протокол (TCP/IP, Transmission Control Protocol/Internet Protocol).[048] Network interface 140 is used to connect computer system 100 to other computer systems or network devices (not shown) via network 160. Network interface 140 may include a combination of hardware and software that provides communication via network 160. In some embodiments of the invention, network interface 140 may be a wireless network interface. Software of network interface 140 may include software that uses one or more network protocols to communicate via network 160. For example, network protocols may include Transmission Control Protocol/Internet Protocol (TCP/IP).

[049] Должно быть понятно, что компьютерная система 100 представляет собой лишь пример и что описанная технология может быть использована с компьютерными системами или другими компьютерными устройствами другой конфигурации.[049] It should be understood that the computer system 100 is merely an example and that the described technology may be used with computer systems or other computer devices of different configurations.

Архитектура модели машинного обученияMachine learning model architecture

[050] На фиг. 2 представлена блок-схема архитектуры 200 модели машинного обучения согласно различным вариантам осуществления настоящей технологии. Архитектура 200 модели машинного обучения основана на модели машинного обучения BERT, как описано, например, в указанной выше работе (Devlin et al.). Подобно модели BERT, архитектура 200 модели машинного обучения содержит стек 202 трансформеров из блоков трансформера, включая, например, блоки 204, 206 и 208 трансформера.[050] Fig. 2 is a block diagram of a machine learning model architecture 200 according to various embodiments of the present technology. The machine learning model architecture 200 is based on the BERT machine learning model, as described, for example, in the above-mentioned work (Devlin et al.). Like the BERT model, the machine learning model architecture 200 comprises a transformer stack 202 of transformer blocks, including, for example, transformer blocks 204, 206, and 208.

[051] Каждый из блоков 204, 206 и 208 трансформера содержит блок кодера трансформера, например, как описано в указанной выше работе (Vaswani et al.). Каждый из блоков 204, 206 и 208 трансформера содержит слой 220 многоголового внимания (показан для иллюстрации только в блоке 204 трансформера) и слой 222 нейронной сети прямого распространения (также для иллюстрации показан только в блоке 204 трансформера). Блоки 204, 206 и 208 трансформера обычно имеют одинаковую структуру, но различные веса (после обучения). В слое 220 многоголового внимания реализованы зависимости между входными данными блока трансформера, которые, например, могут использоваться с целью предоставления контекстной информации для каждого элемента входных данных на основе каждого другого элемента входных данных блока трансформера. В слое 222 нейронной сети прямого распространения такие зависимости обычно отсутствуют, поэтому входные данные слоя 222 нейронной сети прямого распространения могут обрабатываться параллельно. Должно быть понятно, что несмотря на то, что на фиг. 2 показано лишь три блока трансформера (блоки 204, 206 и 208 трансформера), в фактических вариантах реализации настоящей технологии стек 202 трансформеров может содержать намного больше таких блоков трансформера. Например, в некоторых вариантах реализации изобретения в стеке 202 трансформеров может использоваться 12 блоков трансформера.[051] Each of the transformer blocks 204, 206 and 208 comprises a transformer encoder block, such as described in the work (Vaswani et al.) cited above. Each of the transformer blocks 204, 206 and 208 comprises a multi-headed attention layer 220 (shown for illustration only in the transformer block 204) and a feedforward neural network layer 222 (also shown for illustration only in the transformer block 204). The transformer blocks 204, 206 and 208 typically have the same structure but different weights (after training). The multi-headed attention layer 220 implements dependencies between the input data of the transformer block, which, for example, can be used to provide contextual information for each element of the input data based on each other element of the input data of the transformer block. In the feedforward neural network layer 222, such dependencies are typically absent, so the input data of the feedforward neural network layer 222 can be processed in parallel. It should be understood that although only three transformer blocks (transformer blocks 204, 206, and 208) are shown in Fig. 2, in actual embodiments of the present technology, the transformer stack 202 may contain many more such transformer blocks. For example, in some embodiments of the invention, 12 transformer blocks may be used in the transformer stack 202.

[052] Входные данные 230 стека 202 трансформеров содержат токены, такие как токен 232 [CLS] и токены 234. Токены 234 могут, например, представлять слова или части слов. Токен 232 [CLS] используется в качестве представления для классификации всего набора токенов 234. Каждый токен 234 и токен 232 [CLS] представлен вектором. В некоторых вариантах осуществления изобретения длина каждого из этих векторов может, например, соответствовать 768 значениям с плавающей запятой. Должно быть понятно, что для эффективного уменьшения размеров токенов может использоваться множество способов сжатия. В различных вариантах осуществления изобретения в качестве входных данных 230 стека 202 трансформеров может использоваться фиксированное количество токенов 234. Например, в некоторых вариантах осуществления изобретения могут использоваться 1024 токена, а в других вариантах осуществления изобретения стек 202 трансформеров может получать 512 токенов (помимо токена 232 [CLS]). Входные данные 230, длина которых менее этого фиксированного количества токенов 234, могут быть дополнены до фиксированной длины путем добавления заполняющих токенов.[052] The input data 230 of the transformer stack 202 comprises tokens such as the token 232 [CLS] and the tokens 234. The tokens 234 may, for example, represent words or parts of words. The token 232 [CLS] is used as a representation for classifying the entire set of tokens 234. Each token 234 and the token 232 [CLS] is represented by a vector. In some embodiments of the invention, the length of each of these vectors may, for example, correspond to 768 floating point values. It should be understood that a variety of compression methods may be used to efficiently reduce the sizes of the tokens. In various embodiments of the invention, a fixed number of tokens 234 may be used as input data 230 of the transformer stack 202. For example, in some embodiments of the invention, 1024 tokens may be used, and in other embodiments of the invention, the transformer stack 202 may receive 512 tokens (in addition to the [CLS] token 232). Input data 230 that are shorter than this fixed number of tokens 234 may be padded to the fixed length by adding padding tokens.

[053] В некоторых вариантах осуществления изобретения входные данные 230 могут быть сформированы с использованием токенизатора 238 из цифрового объекта 236, такого как элемент из обучающего набора. Архитектура токенизатора 238 обычно зависит от цифрового объекта 236, используемого в качестве входных данных токенизатора 238. Например, для формирования входных данных 230 в токенизаторе 238 могут использоваться известные способы кодирования, такие как кодирование пар байтов, а также могут использоваться предварительно обученные нейронные сети.[053] In some embodiments of the invention, input data 230 may be generated using a tokenizer 238 from a digital object 236, such as an element from a training set. The architecture of the tokenizer 238 typically depends on the digital object 236 used as input to the tokenizer 238. For example, known encoding methods, such as byte pair encoding, may be used to generate input data 230 in the tokenizer 238, and pre-trained neural networks may also be used.

[054] Выходные данные 250 стека 202 трансформеров содержат выходные данные 252 [CLS] и векторные выходные данные 254, включая векторные выходные данные для каждого токена 234 из входных данных 230 стека 202 трансформеров. Затем выходные данные 250 могут быть отправлены модулю 270 задачи. В некоторых вариантах осуществления изобретения, как показано на фиг. 2, модуль задачи использует только выходные данные 252 [CLS], представляющие весь набор выходных данных 254. Это может быть наиболее полезно, когда модуль 270 задачи используется в качестве классификатора, либо для вывода метки или значения, характеризующего весь входной цифровой объект 236, например, для формирования оценки релевантности или вероятности «клика» (нажатиях, например, выборе результате поиска) на документе. В некоторых вариантах осуществления изобретения (не показано на фиг. 2) все или некоторые выходные данные 254 и, возможно, выходные данные 252 [CLS] могут использоваться в качестве входных данных модуля 270 задачи. Это наиболее полезно, когда модуль 270 задачи используется с целью формирования меток или значений для отдельных входных токенов 234, например, для предсказания маскированного или отсутствующего токена либо для распознавания именованного объекта. В некоторых вариантах осуществления изобретения модуль 270 задачи может содержать нейронную сеть прямого распространения (не показана), формирующую зависящий от задачи результат 280, такой как оценка релевантности или вероятность «клика» (нажатия). Другие модели также могут использоваться в модуле 270 задачи. Например, модуль 270 задачи может представлять собой трансформер или нейронную сеть другого вида. Кроме того, зависящий от задачи результат может использоваться в качестве входных данных других моделей, таких как модель CatBoost, как описано в работе Dorogush et al., «CatBoost: gradient boosting with categorical features support», NIPS, 2017.[054] The output data 250 of the transformer stack 202 comprises the output data 252 [CLS] and the vector output data 254, including the vector output data for each token 234 of the input data 230 of the transformer stack 202. The output data 250 may then be sent to the task module 270. In some embodiments of the invention, as shown in Fig. 2, the task module uses only the output data 252 [CLS], which represents the entire set of output data 254. This may be most useful when the task module 270 is used as a classifier, either to output a label or value that characterizes the entire input digital object 236, such as to generate a relevance score or a “click” probability (clicks, such as selecting a search result) on a document. In some embodiments of the invention (not shown in Fig. 2), all or some of the output data 254 and possibly the output data 252 [CLS] may be used as input to the task module 270. This is most useful when the task module 270 is used to generate labels or values for individual input tokens 234, such as to predict a masked or missing token or to recognize a named entity. In some embodiments of the invention, the task module 270 may comprise a feedforward neural network (not shown) that generates a task-specific output 280, such as a relevance score or a click probability. Other models may also be used in the task module 270. For example, the task module 270 may be a transformer or other type of neural network. Additionally, the task-specific output can be used as input to other models, such as the CatBoost model, as described in Dorogush et al., “CatBoost: gradient boosting with categorical features support,” NIPS, 2017.

[055] Должно быть понятно, что архитектура, описанная с ссылкой на фиг. 2, упрощена для лучшего понимания. Например, в практических вариантах реализации архитектуры 200 модели машинного обучения каждый из блоков 204, 206 и 208 трансформера может включать в себя операции нормализации слоя, модуль 270 задачи может содержать функцию нормализации softmax и т.д. Специалистам в данной области должно быть понятно, что эти операции широко используются в нейронных сетях и моделях глубокого обучения, таких как архитектура 200 модели машинного обучения.[055] It should be understood that the architecture described with reference to Fig. 2 is simplified for better understanding. For example, in practical embodiments of the machine learning model architecture 200, each of the transformer blocks 204, 206 and 208 may include layer normalization operations, the task module 270 may contain a softmax normalization function, etc. Those skilled in the art will understand that these operations are widely used in neural networks and deep learning models, such as the machine learning model architecture 200.

Предварительное обучение и точная настройкаPre-training and fine-tuning

[056] Согласно различным вариантам осуществления настоящей технологии, модель машинного обучения, архитектура которой представлена на фиг. 2, может быть обучена с использованием процессов предварительного обучения и точной настройки, как описано ниже. На фиг. 3 представлены наборы данных, которые могут быть использованы для предварительного обучения и точной настройки модели машинного обучения для применения при ранжировании результатов поиска.[056] According to various embodiments of the present technology, a machine learning model, the architecture of which is shown in Fig. 2, can be trained using pre-training and fine-tuning processes, as described below. Fig. 3 shows data sets that can be used to pre-train and fine-tune a machine learning model for use in ranking search results.

[057] Наборы данных содержат набор данных 302 «Docs» (документы), представляющий собой большую коллекцию неразмеченных документов 303 с максимальной длиной токена 304, равной 1024. Набор данных 302 «Docs» используется для предварительного обучения с использованием маскированного языкового моделирования (MLM, Masked Language Modeling) (см. ниже). Предварительное обучение на наборе данных 302 «Docs» используется, чтобы обеспечить своего рода базовую модель языка, способствующую повышению качества последующего обучения и обеспечению стабильности обучения. В некоторых вариантах осуществления изобретения набор данных 302 «Docs» может содержать приблизительно 600 миллионов обучающих цифровых объектов (т.е. неразмеченных документов с максимальной длиной токена, равной 1024).[057] The data sets comprise a Docs data set 302, which is a large collection of unlabeled documents 303 with a maximum token length 304 of 1024. The Docs data set 302 is used for pre-training using Masked Language Modeling (MLM) (see below). Pre-training on the Docs data set 302 is used to provide a kind of base language model that helps improve the quality of subsequent training and ensure the stability of training. In some embodiments, the Docs data set 302 may contain approximately 600 million training digital objects (i.e., unlabeled documents with a maximum token length of 1024).

[058] Наборы данных также содержат набор данных 310 «Clicks» («клики»), элементы 311 которого содержат пользовательский запрос 312 и документ 314 из результатов поиска по пользовательскому запросу 312 и размечены с использованием информации 316 о «клике», указывающей на то, выбрал ли пользователь документ 314. Запрос 312 помимо текста запроса содержит метаданные 313 запроса, которые могут, например, содержать географический регион, из которого отправлен запрос. Аналогично, документ 314 содержит текст документа и метаданные 315 документа, которые могут содержать заголовок документа и веб-адрес документа (например, в виде URL-адреса).[058] The data sets also comprise a "Clicks" data set 310, the elements 311 of which comprise a user query 312 and a document 314 from the search results for the user query 312 and are marked up using "click" information 316 indicating whether the user selected the document 314. The query 312, in addition to the query text, comprises query metadata 313, which may, for example, comprise the geographic region from which the query was sent. Similarly, the document 314 comprises the document text and document metadata 315, which may comprise the document title and the web address of the document (for example, in the form of a URL).

[059] В некоторых вариантах осуществления изобретения информация 316 о «клике» может быть предварительно обработана и указывать на выбор пользователем документа только в случае «длинного клика», когда пользователь останавливается на выбранном документе на «продолжительное» время. «Длинные клики» представляют собой широко используемый показатель релевантности результата поиска запросу, поскольку они указывают на то, что пользователь мог обнаружить релевантную информацию в документе, а не просто «кликнул» (нажал на) документ и быстро вернулся к результатам поиска. Например, в некоторых вариантах осуществления изобретения «длинный клик» может указывать на то, что пользователь останавливался на документе по меньшей мере на 120 секунд.[059] In some embodiments of the invention, the "click" information 316 may be pre-processed to indicate the user's selection of a document only in the case of a "long click," where the user dwells on the selected document for an "extended" time. "Long clicks" are a widely used indicator of the relevance of a search result to a query, since they indicate that the user was able to find relevant information in the document, rather than simply "clicking" (tapping on) the document and quickly returning to the search results. For example, in some embodiments of the invention, a "long click" may indicate that the user dwelled on the document for at least 120 seconds.

[060] Поскольку набор данных 310 «Clicks» основан на информации, регулярно собираемой в результате использования поисковой системы пользователями, он чрезвычайно велик. Например, в некоторых вариантах осуществления изобретения набор данных 310 «Clicks» может содержать приблизительно 23 миллиарда обучающих цифровых объектов (т.е. элементов, содержащих запрос и документ и размеченных с использованием информации о «клике»). Из-за своего размера набор данных 310 «Clicks» образует основную часть конвейера обучения и используется при предварительном обучении, как описано ниже.[060] Because the Clicks data set 310 is based on information regularly collected as a result of users using the search engine, it is extremely large. For example, in some embodiments of the invention, the Clicks data set 310 may contain approximately 23 billion training digital objects (i.e., elements containing a query and a document and labeled using the click information). Because of its size, the Clicks data set 310 forms the main part of the training pipeline and is used in pre-training, as described below.

[061] Наборы данных также содержат наборы 350 данных о релевантности, используемые для точной настройки, как описано ниже. В некоторых вариантах осуществления изобретения наборы 350 данных о релевантности содержат набор 352 данных «Rel-Big» (большой набор данных о релевантности), набор 354 данных «Rel-Mid» (средний набор данных о релевантности) и набор 356 данных о релевантности «Rel-Small» (малый набор данных о релевантности). Элементы 357, 358 и 359 из этих наборов данных содержат запросы 360, 362 и 364, соответственно, и документы 370, 372 и 374, соответственно. Элементы из наборов 350 данных о релевантности размечены с использованием оценок 380, 382 и 384 релевантности, соответственно. Оценки 380, 382 и 384 релевантности основаны на введенных оценщиком-человеком данных о том, насколько документы релевантны поисковому запросу. Эти введенные оценщиком-человеком данные могут быть обеспечены с использованием краудсорсинга или других средств сбора от людей данных о релевантности документа запросу.[061] The data sets also contain relevance data sets 350 used for fine-tuning, as described below. In some embodiments of the invention, relevance data sets 350 contain a Rel-Big data set 352, a Rel-Mid data set 354, and a Rel-Small relevance data set 356. Items 357, 358, and 359 of these data sets contain queries 360, 362, and 364, respectively, and documents 370, 372, and 374, respectively. Items from relevance data sets 350 are labeled using relevance scores 380, 382, and 384, respectively. Relevance scores 380, 382, and 384 are based on human rater input about how relevant documents are to the search query. This human rater input may be provided using crowdsourcing or other means of collecting data from people about document relevance to the query.

[062] Поскольку оценки 380, 382 и 384 основаны на данных, введенных оценщиками-людьми, для сбора наборов 350 данных о релевантности может потребоваться больше времени и средств, чем для других наборов данных, используемых при обучении модели машинного обучения. Поэтому наборы 350 данных о релевантности намного меньше других наборов данных и используются не для предварительного обучения, а для точной настройки. Например, в некоторых вариантах осуществления изобретения набор 352 данных «Rel-Big» может содержать приблизительно 50 миллионов обучающих цифровых объектов (т.е. элементов), набор 354 данных «Rel-Mid» может содержать приблизительно 2 миллиона обучающих цифровых объектов, а набор 356 данных «Rel-Small» может содержать приблизительно 1 миллион обучающих цифровых объектов. В общем случае наборы 350 данных о релевантности отличаются по размеру, возрасту и близости к новейшим способам расчета оценок релевантности: набор 352 данных «Rel-Big» является самым большим и старым (с точки зрения возраста данных и способов расчета оценок релевантности), а набор 356 данных «Rel-Small» является наименьшим и самым новым.[062] Since the ratings 380, 382, and 384 are based on data entered by human raters, the relevance data sets 350 may require more time and resources to collect than other data sets used to train the machine learning model. Therefore, the relevance data sets 350 are much smaller than the other data sets and are used for fine-tuning rather than pre-training. For example, in some embodiments, the Rel-Big data set 352 may contain approximately 50 million training digital objects (i.e., items), the Rel-Mid data set 354 may contain approximately 2 million training digital objects, and the Rel-Small data set 356 may contain approximately 1 million training digital objects. In general, the 350 relevance datasets vary in size, age, and proximity to the latest ways of calculating relevance scores: the 352 "Rel-Big" dataset is the largest and oldest (in terms of data age and ways of calculating relevance scores), and the 356 "Rel-Small" dataset is the smallest and newest.

[063] На фиг. 4 представлена блок-схема 400 этапов предварительного обучения и точной настройки, выполняемых для обучения модели машинного обучения формированию оценок релевантности, согласно различным вариантам осуществления настоящей технологии. На первом этапе 402 модель машинного обучения предварительно обучается с использованием набора 302 данных «Docs» (см. фиг. 3) и задачи MLM.[063] Fig. 4 is a flow chart 400 of the pre-training and fine-tuning steps performed to train a machine learning model to generate relevance scores, according to various embodiments of the present technology. In a first step 402, the machine learning model is pre-trained using the "Docs" data set 302 (see Fig. 3) and the MLM task.

[064] Задача MLM основана на одной из двух задач обучения без учителя, применяемых в модели BERT, которая используется, чтобы выучить представления текстов из коллекций неразмеченных документов (следует отметить, что другая задача обучения без учителя, применяемая в модели BERT, представляет собой задачу предсказания следующего предложения, которая обычно не используется в вариантах осуществления настоящей технологии). Для предварительного обучения на задаче MLM один или несколько токенов из входных данных модели машинного обучения маскируются путем замены их на специальный токен [MASK] (не показано). Модель машинного обучения обучается прогнозированию вероятности соответствия маскированного токена токенам из словаря токенов. Это выполняется на основе соответствующих маскированным токенам выходных данных (каждый элемент которых представляет собой вектор) последнего слоя стека трансформеров (см. выше) модели машинного обучения. Поскольку фактические маскированные токены известны (т.е. они являются контрольной информацией), потери кросс-энтропии, представляющие меру отклонения прогнозируемых вероятностей от фактических маскированных токенов (здесь называются потерями MLM), рассчитываются и используются для корректировки весов в модели машинного обучения с целью уменьшения потерь.[064] The MLM task is based on one of two unsupervised learning tasks used in the BERT model, which is used to learn text representations from collections of unlabeled documents (it should be noted that the other unsupervised learning task used in the BERT model is the next sentence prediction task, which is not typically used in embodiments of the present technology). To pre-train on the MLM task, one or more tokens from the input to the machine learning model are masked by replacing them with a special token [MASK] (not shown). The machine learning model is trained to predict the probability of a masked token corresponding to tokens in a token dictionary. This is done based on the outputs (each element of which is a vector) corresponding to the masked tokens from the last layer of the transformer stack (see above) of the machine learning model. Since the actual masked tokens are known (i.e. they are the control information), the cross-entropy loss, which is a measure of the deviation of the predicted probabilities from the actual masked tokens (here called MLM loss), is calculated and used to adjust the weights in the machine learning model to reduce the loss.

[065] На втором этапе 404 предварительного обучения обучающие цифровые объекты из набора 310 данных «Clicks» (см. фиг. 3) используются для предварительного обучения модели машинного обучения. Это выполняется путем токенизации запроса, включая метаданные запроса, и документа, включая метаданные документа. Токенизированные запрос и документ используются в качестве входных данных для модели машинного обучения, при этом один или несколько токенов маскируются, как и на первом этапе. Таким образом, метаданные запроса и метаданные документа, содержащие такую информацию, как веб-адрес документа и географический регион запроса, подаются непосредственно в модель машинного обучения вместе с текстом запроса и документом на естественном языке.[065] In the second pre-training step 404, the training digital objects from the Clicks dataset 310 (see Fig. 3) are used to pre-train the machine learning model. This is accomplished by tokenizing the query, including the query metadata, and the document, including the document metadata. The tokenized query and document are used as input to the machine learning model, with one or more tokens masked as in the first step. In this way, the query metadata and the document metadata, containing information such as the web address of the document and the geographic region of the query, are fed directly to the machine learning model along with the query text and the natural language document.

[066] Для преобразования запроса и документа, включая метаданные, обучающего цифрового объекта из набора данных «Clicks» в токены может быть использован заранее созданный словарь токенов, подходящих для текста на естественном языке и видов метаданных, используемых в наборе данных «Clicks». В некоторых вариантах осуществления изобретения это может быть выполнено с использованием схемы кодирования пар байтов WordPiece, используемой в модели BERT, с достаточно большим размером словаря. Например, в некоторых вариантах осуществления изобретения размер словаря может соответствовать приблизительно 120000 токенов. В некоторых вариантах осуществления изобретения может выполняться предварительная обработка текста, такая как преобразование всех слов в строчные буквы и нормализация Unicode NFC. Схема кодирования пар байтов WordPiece, которая может быть использована в некоторых вариантах осуществления изобретения для построения словаря токенов, описана, например, в работе Rico Sennrich et al., «Neural Machine Translation of Rare Words with Subword Units», Proceedings of the 54^th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715-1725, 2016.[066] To transform the query and the document, including metadata, of the training digital object from the Clicks data set into tokens, a pre-built dictionary of tokens suitable for the natural language text and the types of metadata used in the Clicks data set may be used. In some embodiments, this may be accomplished using the WordPiece byte pair encoding scheme used in the BERT model with a sufficiently large dictionary size. For example, in some embodiments, the dictionary size may correspond to approximately 120,000 tokens. In some embodiments, text pre-processing may be performed, such as converting all words to lowercase and Unicode NFC normalization. A WordPiece byte pair encoding scheme that may be used in some embodiments of the invention to construct a token dictionary is described, for example, in Rico Sennrich et al., “Neural Machine Translation of Rare Words with Subword Units,” Proceedings of the ^54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725, 2016.

[067] На втором этапе 404 предварительного обучения модель машинного обучения обучается с использованием потерь MLM, как описано выше, с маскированными токенами. Модель машинного обучения также настраивается с классификатором на основе нейронной сети в качестве модуля задачи (как описано с ссылкой на фиг. 2), который прогнозирует вероятность «клика» для документа. В некоторых вариантах осуществления изобретения прогнозируемая вероятность «клика» может быть определена на основе выходных данных [CLS]. Поскольку обучающие цифровые объекты из набора данных «Clicks» содержат информацию о том, выбрал пользователь документ или нет, эта контрольная информация может быть использована, например, для определения потерь кросс-энтропии (называется потерями прогноза «клика»), представляющими расстояние или различие между прогнозируемой вероятностью «клика» и контрольной информацией. Потери прогноза «клика» могут быть использованы для корректировки весов в модели машинного обучения с целью обучения модели.[067] In the second pre-training step 404, the machine learning model is trained using the MLM loss as described above with masked tokens. The machine learning model is also configured with a neural network-based classifier as a task module (as described with reference to Fig. 2) that predicts the probability of a "click" for a document. In some embodiments of the invention, the predicted probability of a "click" may be determined based on the output of the [CLS]. Since the training digital objects from the "Clicks" data set contain information about whether the user has selected the document or not, this control information can be used, for example, to determine a cross-entropy loss (called a "click" prediction loss) representing the distance or difference between the predicted probability of a "click" and the control information. The "click" prediction loss can be used to adjust the weights in the machine learning model for the purpose of training the model.

[068] Несмотря на то, что набор данных Clicks, собранный из журналов операций, может использоваться в качестве заменителя в отношении релевантности, он может не отражать должным образом фактическую релевантность документа запросу. Эта проблема устраняется на этапе 406 точной настройки путем использования наборов данных о релевантности (описано выше) для обучения модели машинного обучения на документах, вручную размеченных оценщиками-людьми по их релевантности запросу.[068] Although the Clicks dataset collected from transaction logs can be used as a surrogate for relevance, it may not properly reflect the actual relevance of a document to a query. This problem is addressed in the fine-tuning step 406 by using the relevance datasets (described above) to train a machine learning model on documents manually labeled by human raters for their relevance to a query.

[069] В некоторых вариантах осуществления изобретения этап 406 точной настройки сначала выполняется с использованием набора данных «Rel-Big» (как описано выше с ссылкой на фиг. 3), который является не только самым большим, но и самым старым из наборов данных о релевантности. Запросы и документы токенизируются, как описано выше, и предоставляются модели машинного обучения в качестве входных данных. Модель машинного обучения использует модуль задачи на основе нейронной сети для формирования прогнозируемой оценки релевантности. В некоторых вариантах осуществления изобретения модуль задачи может определять прогнозируемую оценку релевантности на основе выходных данных [CLS]. Набор данных «Rel-Big» содержит оценку релевантности, определенную оценщиком-человеком, которая может использоваться в качестве контрольной информации при обучении модели машинного обучения. Эта контрольная информация может быть использована, например, для определения потерь кросс-энтропии, представляющих расстояние или различие между прогнозируемой оценкой релевантности и контрольной информацией, которые могут быть использованы для корректировки весов в модели машинного обучения.[069] In some embodiments of the invention, the fine-tuning step 406 is first performed using the Rel-Big data set (as described above with reference to Fig. 3), which is not only the largest but also the oldest of the relevance data sets. Queries and documents are tokenized as described above and provided to a machine learning model as input. The machine learning model uses a neural network task module to generate a predicted relevance score. In some embodiments of the invention, the task module can determine the predicted relevance score based on the output of the [CLS]. The Rel-Big data set contains a relevance score determined by a human assessor, which can be used as control information when training the machine learning model. This control information can be used, for example, to determine a cross-entropy loss that represents the distance or difference between the predicted relevance score and the control information, which can be used to adjust the weights in the machine learning model.

[070] В некоторых вариантах осуществления изобретения повторная разметка большого набора данных «Clicks» и повторное обучение модели с использованием повторно размеченного набора данных может использоваться при точной настройке для повышения эффективности модели машинного обучения. Это может быть выполнено путем использования модели машинного обучения, обученной, как описано выше, формированию прогнозируемых оценок релевантности в отношении объектов данных из набора данных «Clicks» для эффективной повторной разметки объектов данных из набора данных «Clicks» с целью формирования дополненного набора данных «Clicks» с синтезированными метками оценщика. Затем дополненный набор данных «Clicks» может быть использован для повторного обучения модели машинного обучения прогнозированию оценок релевантности с использованием синтезированных меток оценщика в качестве контрольной информации.[070] In some embodiments of the invention, re-labeling a large Clicks data set and re-training a model using the re-labeled data set may be used in fine-tuning to improve the performance of a machine learning model. This may be accomplished by using a machine learning model trained as described above to generate predicted relevance scores for data objects from the Clicks data set to efficiently re-label the data objects from the Clicks data set to generate an augmented Clicks data set with synthesized rater labels. The augmented Clicks data set may then be used to re-train the machine learning model to predict relevance scores using the synthesized rater labels as control information.

[071] Должно быть понятно, что подобный подход, в котором первая модель используется для дополнения или разметки набора данных, который затем используется для обучения второй модели, может быть применен для эффективного переноса во вторую модель знаний, вложенных в первую модель. Фактически, первая модель превращается в «учителя» для второй модели. Такие способы переноса могут использоваться с различными архитектурами моделей так, чтобы архитектура первой модели отличалась от архитектуры второй модели. Например, вторая модель может представлять собой меньшую нейронную сеть, чем первая модель, и может обеспечивать по существу близкие или даже более точные результаты, например, с использованием меньшего количества слоев, и таким образом, может быстрее выполняться на этапе использования.[071] It should be understood that a similar approach, in which the first model is used to augment or label a data set that is then used to train a second model, can be used to effectively transfer the knowledge embedded in the first model to the second model. In effect, the first model becomes a "teacher" for the second model. Such transfer methods can be used with different model architectures such that the architecture of the first model is different from the architecture of the second model. For example, the second model may be a smaller neural network than the first model and may provide substantially similar or even more accurate results, for example using fewer layers, and thus may be faster to execute at the use stage.

[072] В некоторых вариантах осуществления изобретения такая точная настройка может повторяться с использованием других наборов данных из числа наборов данных о релевантности. Например, модель машинного обучения сначала может быть настроена с использованием набора данных «Rel-Big», затем уточнена с помощью набора данных «Rel-Mid», а затем еще раз уточнена с использованием набора данных «Rel-Small». В некоторых вариантах осуществления изобретения все или некоторые из этих этапов уточнения модели машинного обучения также могут включать в себя повторную разметку набора данных «Clicks» (или другого большого набора данных) и повторное обучение модели машинного обучения, как описано выше.[072] In some embodiments of the invention, such fine-tuning may be repeated using other data sets from among the relevance data sets. For example, a machine learning model may first be tuned using the Rel-Big data set, then refined using the Rel-Mid data set, and then refined again using the Rel-Small data set. In some embodiments of the invention, all or some of these machine learning model refinement steps may also include re-labeling the Clicks data set (or other large data set) and re-training the machine learning model, as described above.

[073] Использующая этот многоэтапный подход модель машинного обучения может рассматриваться как обеспечивающая грубую первоначальную оценку релевантности документа запросу после первоначального обучения с использованием набора данных «Clicks» и улучшающая эту грубую первоначальную оценку на каждом следующем этапе точной настройки. Для определения улучшений по сравнению с первоначальной оценкой на каждом этапе точной настройки может использоваться метрика, обычно применяемая в задачах ранжирования, такая как метрика на основе нормализованного дисконтированного кумулятивного показателя.[073] A machine learning model using this multi-stage approach can be thought of as providing a rough initial estimate of the relevance of a document to a query after initial training using the Clicks dataset, and improving this rough initial estimate at each subsequent fine-tuning stage. A metric commonly used in ranking problems, such as a metric based on the normalized discounted cumulative score, can be used to determine the improvements over the initial estimate at each fine-tuning stage.

[074] На фиг. 5 представлена блок-схема 500 компьютерного способа обучения модели машинного обучения согласно различным вариантам осуществления настоящей технологии. Блок-схема 500 содержит первый этап 570 предварительного обучения, второй этап 572 предварительного обучения и этап 574 точной настройки.[074] Fig. 5 shows a flow chart 500 of a computer method for training a machine learning model according to various embodiments of the present technology. The flow chart 500 includes a first pre-training stage 570, a second pre-training stage 572, and a fine-tuning stage 574.

[075] В блоке 502 первого этапа 570 предварительного обучения процессор получает набор неразмеченных цифровых документов на естественном языке. В блоке 504 процессор преобразует цифровые документы из набора неразмеченных цифровых документов на естественном языке в набор токенов и затем один или несколько токенов маскируются.[075] In block 502 of the first stage 570 of pre-training, the processor receives a set of unlabeled digital natural language documents. In block 504, the processor converts the digital documents from the set of unlabeled digital natural language documents into a set of tokens, and then one or more tokens are masked.

[076] В блоке 506 модель машинного обучения обучается с использованием маскированного набора токенов в качестве входных данных. Выходные данные модели машинного обучения, соответствующие маскированным токенам, используются вместе с фактическими маскированными токенами для определения потерь (например, потерь кросс-энтропии), используемых для корректировки весов модели машинного обучения. Должно быть понятно, что блоки 504 и 506 могут повторно выполняться для всех неразмеченных цифровых документов на естественном языке или для их подмножества. В некоторых вариантах осуществления изобретения первый этап 570 предварительного обучения может быть опущен или обучение может начинаться со второго этапа предварительного обучения, например, с использованием «обычной» предварительно обученной модели BERT.[076] In block 506, a machine learning model is trained using the masked set of tokens as input. The output of the machine learning model corresponding to the masked tokens is used along with the actual masked tokens to determine a loss (e.g., a cross-entropy loss) used to adjust the weights of the machine learning model. It should be understood that blocks 504 and 506 may be repeated for all unlabeled digital natural language documents or for a subset thereof. In some embodiments of the invention, the first pre-training stage 570 may be omitted or training may begin with a second pre-training stage, for example, using a "regular" pre-trained BERT model.

[077] В блоке 508 второго этапа 572 предварительного обучения процессор получает первый набор обучающих цифровых объектов. Обучающие цифровые объекты из первого набора обучающих цифровых объектов связаны с параметром прошлых пользовательских действий. Параметр прошлых пользовательских действий представляет пользовательское действие прошлого пользователя с обучающим цифровым объектом, такое как «клик» на связанном с обучающим цифровым объектом цифровом документе, который соответствует запросу, связанному с обучающим цифровым объектом. В некоторых вариантах осуществления изобретения обучающий цифровой объект связан с запросом, содержащим текст запроса и метаданные запроса, с документом, содержащим текст документа и метаданные документа, и с прошлым пользовательским действием. Метаданные запроса могут, например, содержать географический регион, из которого отправлен запрос. Метаданные документа могут, например, содержать веб-адрес документа, такой как URL-адрес документа, и заголовок документа. В некоторых вариантах осуществления изобретения запрос, включая его метаданные, может входить в состав метаданных документа.[077] In block 508 of the second pre-training stage 572, the processor receives a first set of training digital objects. The training digital objects from the first set of training digital objects are associated with a parameter of past user actions. The parameter of past user actions represents a user action of a past user with the training digital object, such as a "click" on a digital document associated with the training digital object that matches a query associated with the training digital object. In some embodiments of the invention, the training digital object is associated with a query containing a query text and query metadata, with a document containing a document text and document metadata, and with a past user action. The query metadata may, for example, contain a geographic region from which the query is sent. The document metadata may, for example, contain a web address of the document, such as a URL of the document, and a title of the document. In some embodiments of the invention, the query, including its metadata, may be included in the document metadata.

[078] В блоке 510 процессор преобразует запрос и цифровой документ, связанные с обучающим цифровым объектом, включая метаданные, связанные с запросом и с цифровым документом, в токены и один или несколько токенов маскируются для формирования входных токенов. Такая токенизация может выполняться с использованием заранее созданного словаря токенов, которые в некоторых вариантах осуществления изобретения могут определяться с использованием кодирования пар байтов.[078] In block 510, the processor converts the query and the digital document associated with the training digital object, including metadata associated with the query and the digital document, into tokens and one or more tokens are masked to form input tokens. Such tokenization can be performed using a pre-created dictionary of tokens, which in some embodiments of the invention can be defined using byte pair encoding.

[079] В блоке 512 модель машинного обучения обучается определению параметра прогнозируемых пользовательских действий, такого как вероятность «клика» пользователя на документе, указывающего на то, что пользователь считает этот документ релевантным запросу. Это выполняется с использованием параметра прогнозируемых пользовательских действий и параметра прошлых пользовательских действий для определения потерь, используемых для корректировки весов в модели машинного обучения. В некоторых вариантах осуществления изобретения модель машинного обучения может быть дополнительно обучена на входных токенах прогнозированию маскированных токенов на основе контекста, обеспечиваемого соседними токенами. Выходные данные модели машинного обучения, соответствующие маскированным токенам, используются вместе с фактическими маскированными токенами для определения потерь, используемых для корректировки весов модели машинного обучения. Благодаря обучению на этих маскированных токенах, сформированные моделью машинного обучения прогнозы могут содержать информацию, указывающую на параметр семантической релевантности, указывающий на степень семантической релевантности поискового запроса контенту входного цифрового объекта. Должно быть понятно, что блоки 510 и 512 могут повторно выполняться для всех объектов из набора обучающих цифровых объектов или для их подмножества.[079] In block 512, a machine learning model is trained to determine a predicted user action parameter, such as the probability of a user "clicking" on a document, indicating that the user considers the document relevant to the query. This is performed using the predicted user action parameter and the past user action parameter to determine a loss used to adjust the weights in the machine learning model. In some embodiments of the invention, the machine learning model may be further trained on the input tokens to predict masked tokens based on the context provided by neighboring tokens. The output of the machine learning model corresponding to the masked tokens is used together with the actual masked tokens to determine a loss used to adjust the weights of the machine learning model. By training on these masked tokens, the predictions generated by the machine learning model may contain information indicative of a semantic relevance parameter indicative of the degree of semantic relevance of the search query to the content of the input digital object. It should be understood that blocks 510 and 512 may be repeatedly executed for all objects from the set of training digital objects or for a subset thereof.

[080] В блоке 514 этапа 574 точной настройки процессор получает второй набор обучающих цифровых объектов, обучающий цифровой объект из которого связан с поисковым запросом, который может содержать метаданные, с цифровым документом, который может содержать метаданные, и со сформированной оценщиком меткой. Сформированная оценщиком метка указывает на степень релевантности обучающего цифрового объекта (в частности, цифрового документа в некоторых вариантах осуществления изобретения) поисковому запросу с точки зрения оценщика-человека, назначившего сформированную оценщиком метку.[080] In block 514 of fine-tuning step 574, the processor receives a second set of training digital objects, a training digital object from which is associated with a search query that may contain metadata, with a digital document that may contain metadata, and with a label generated by the evaluator. The label generated by the evaluator indicates the degree of relevance of the training digital object (in particular, the digital document in some embodiments of the invention) to the search query from the point of view of the human evaluator who assigned the label generated by the evaluator.

[081] В блоке 516 процессор обучает модель машинного обучения определению синтезированной метки оценщика для обучающего цифрового объекта. Синтезированная метка оценщика представляет собой прогноз модели машинного обучения относительно того, насколько обучающий цифровой объект релевантен поисковому запросу. Обучение может быть выполнено путем предоставления модели машинного обучения токенизированного представления обучающего цифрового объекта (содержащего поисковый запрос и документ) и использования модели машинного обучения для формирования синтезированной метки оценщика. Синтезированная метка оценщика и сформированная оценщиком метка, которую сформировал оценщик-человек, используются для определения потерь, которые могут быть использованы с целью корректировки весов в модели машинного обучения для точной настройки модели машинного обучения. Должно быть понятно, что блок 516 может повторно выполняться для всех объектов из второго набора обучающих цифровых объектов или для их подмножества.[081] In block 516, the processor trains a machine learning model to determine a synthesized rater label for the training digital object. The synthesized rater label is a prediction of the machine learning model regarding how relevant the training digital object is to the search query. The training may be performed by providing the machine learning model with a tokenized representation of the training digital object (containing the search query and the document) and using the machine learning model to generate a synthesized rater label. The synthesized rater label and the rater-generated label generated by the human rater are used to determine a loss that may be used to adjust the weights in the machine learning model to fine-tune the machine learning model. It should be understood that block 516 may be repeated for all objects from the second set of training digital objects or for a subset thereof.

[082] В блоке 518 модель машинного обучения дополнительно точно настраивается процессором, применяющим модель машинного обучения в отношении всех объектов из первого набора обучающих цифровых объектов или их подмножества с целью дополнения первого набора обучающих цифровых объектов синтезированными метками оценщика и формирования первого дополненного набора обучающих цифровых объектов. В блоке 520 модель машинного обучения точно настраивается с использованием первого дополненного набора обучающих цифровых объектов для обучения модели машинного обучения, по существу как описано выше с ссылкой на блок 516. [082] In block 518, the machine learning model is further fine-tuned by the processor, applying the machine learning model to all objects from the first set of training digital objects or a subset thereof in order to supplement the first set of training digital objects with the synthesized labels of the evaluator and to form a first supplemented set of training digital objects. In block 520, the machine learning model is fine-tuned using the first supplemented set of training digital objects for training the machine learning model, substantially as described above with reference to block 516.

[083] Должно быть понятно, что этап 574 точной настройки может полностью или частично повторяться с другими наборами обучающих цифровых объектов, содержащих сформированные оценщиком метки, чтобы затем последовательно уточнять модель машинного обучения. Например, в некоторых вариантах осуществления изобретения после выполнения описанной выше точной настройки процессор получает третий набор обучающих цифровых объектов, обучающий цифровой объект из которого связан с поисковым запросом, который использовался для формирования обучающего цифрового объекта и может содержать метаданные, с цифровым документом, который может содержать метаданные, и со сформированной оценщиком меткой. Как и ранее, сформированная оценщиком метка указывает на степень релевантности обучающего цифрового объекта (в частности, цифрового документа в некоторых вариантах осуществления изобретения) поисковому запросу с точки зрения оценщика-человека, назначившего сформированную оценщиком метку. Этот дополнительный набор обучающих цифровых объектов может отличаться от любого другого набора цифровых обучающих объектов, использованного при обучении, как описано выше, или, например, может быть таким же, как второй набор обучающих цифровых объектов. Кроме того, размер третьего набора обучающих цифровых объектов может отличаться от размера других наборов обучающих цифровых объектов, использованных для обучения и/или точной настройки модели машинного обучения.[083] It should be understood that the fine-tuning step 574 may be repeated in whole or in part with other sets of training digital objects containing the evaluator-generated labels in order to subsequently refine the machine learning model. For example, in some embodiments of the invention, after performing the fine-tuning described above, the processor receives a third set of training digital objects, a training digital object from which is associated with a search query that was used to generate the training digital object and may contain metadata, with a digital document that may contain metadata, and with a label generated by the evaluator. As before, the label generated by the evaluator indicates the degree of relevance of the training digital object (in particular, the digital document in some embodiments of the invention) to the search query from the point of view of the human evaluator who assigned the label generated by the evaluator. This additional set of training digital objects may be different from any other set of digital training objects used in training, as described above, or, for example, may be the same as the second set of training digital objects. Additionally, the size of the third set of training digital features may differ from the size of the other sets of training digital features used to train and/or fine-tune the machine learning model.

[084] Модель машинного обучения точно настраивается с использованием дополнительного набора обучающих цифровых объектов для обучения модели машинного обучения, по существу как описано выше с ссылкой на блок 516. После этого дополнительного обучения модель может быть использована для формирования уточненной метки релевантности.[084] The machine learning model is fine-tuned using an additional set of training digital objects to train the machine learning model, substantially as described above with reference to block 516. After this additional training, the model may be used to generate a refined relevance label.

[085] На фиг. 6 представлена блок-схема 600 использования полностью обученной модели машинного для ранжирования результатов поиска. В блоке 602 процессор получает набор цифровых объектов этапа использования. Каждый цифровой объект этапа использования связан с поисковым запросом (включая метаданные), введенным пользователем, и с цифровым документом (включая метаданные), предоставленным в ответ на этот запрос. Например, если поисковая система обнаружила 75 документов, соответствующих запросу, то набор цифровых объектов этапа использования будет содержать 75 цифровых объектов этапа использования, каждый из которых будет содержать запрос (включая метаданные) и один из документов (включая метаданные).[085] Fig. 6 is a flow chart 600 of using a fully trained machine learning model to rank search results. In block 602, the processor receives a set of use-stage digital objects. Each use-stage digital object is associated with a search query (including metadata) entered by the user and with a digital document (including metadata) provided in response to this query. For example, if the search engine found 75 documents matching the query, then the set of use-stage digital objects will contain 75 use-stage digital objects, each of which will contain the query (including metadata) and one of the documents (including metadata).

[086] В блоке 604 процессор токенизирует цифровой объект этапа использования из набора цифровых объектов этапа использования и использует полученные в результате токены в качестве входных данных для модели машинного обучения этапа использования. Модель машинного обучения этапа использования формирует параметр релевантности для цифрового объекта этапа использования. Параметр релевантности представляет собой прогноз модели машинного обучения этапа использования для релевантности цифрового объекта этапа использования (например, для документа, связанного с цифровым объектом этапа использования) запросу. Цифровой объект этапа использования размечается с использованием параметра релевантности. Блок 604 может повторяться для всех объектов из набора цифровых объектов этапа использования или для их подмножества с целью формирования размеченного набора цифровых объектов этапа использования.[086] In block 604, the processor tokenizes a use-stage digital object from the set of use-stage digital objects and uses the resulting tokens as input to a use-stage machine learning model. The use-stage machine learning model generates a relevance parameter for the use-stage digital object. The relevance parameter is a prediction of the use-stage machine learning model for the relevance of a use-stage digital object (e.g., a document associated with the use-stage digital object) to a query. The use-stage digital object is labeled using the relevance parameter. Block 604 may be repeated for all objects in the set of use-stage digital objects or for a subset thereof to generate a labeled set of use-stage digital objects.

[087] В блоке 606 размеченный набор цифровых объектов этапа использования ранжируется по их параметру релевантности. В некоторых вариантах осуществления изобретения это может выполняться с использованием другой модели машинного обучения, предварительно обученной ранжированию размеченного набора цифровых объектов этапа использования с применением их параметров релевантности в качестве входных признаков. В некоторых вариантах осуществления изобретения эта другая модель машинного обучения может представлять собой модель обучения на основе деревьев решений CatBoost.[087] In block 606, the labeled set of digital objects of the use stage is ranked by their relevance parameter. In some embodiments of the invention, this may be performed using another machine learning model that is pre-trained to rank the labeled set of digital objects of the use stage using their relevance parameters as input features. In some embodiments of the invention, this other machine learning model may be a CatBoost decision tree learning model.

[088] Также должно быть понятно, что, несмотря на то, что представленные здесь варианты осуществления изобретения описаны с ссылкой на конкретные признаки и структуры, без выхода за границы таких технологий могут быть реализованы их различные модификации и сочетания. Например, различные оптимизации, применимые в нейронных сетях, включая трансформеры и/или модель BERT, могут подобным образом применяться и в настоящей технологии. Кроме того, также могут применяться оптимизации, ускоряющие определение релевантности на этапе использования. Например, в некоторых вариантах осуществления изобретения трансформерная модель может быть разделена так, что некоторые блоки трансформера поделены между обработкой запроса и обработкой документа, поэтому представления документов могут быть предварительно рассчитаны в автономном режиме и сохранены в индексе поиска документов. Описание и чертежи следует рассматривать лишь как иллюстрацию обсуждаемых вариантов реализации или осуществления изобретения, принципы которого определенны приложенной формулой изобретения, охватывающей любые модификации, изменения, сочетания и эквиваленты в пределах объема настоящего изобретения.[088] It should also be understood that, although the embodiments of the invention presented herein are described with reference to specific features and structures, various modifications and combinations thereof may be implemented without departing from the scope of such technologies. For example, various optimizations applicable to neural networks, including transformers and/or the BERT model, may be similarly applied to the present technology. In addition, optimizations that speed up the determination of relevance at the use stage may also be applied. For example, in some embodiments of the invention, the transformer model may be split such that some transformer blocks are divided between query processing and document processing, so that document representations can be pre-computed offline and stored in a document search index. The description and drawings should be considered only as illustrative of the discussed embodiments or implementations of the invention, the principles of which are defined by the appended claims, which cover any modifications, changes, combinations and equivalents within the scope of the present invention.

Claims

1. A computer-implemented method for training a machine learning model to rank use-stage digital objects generated using a use-stage search query, executed by a processor and comprising:

- receiving by the processor a first plurality of training digital objects, wherein each training digital object from the first plurality of training digital objects is associated with a parameter of past user actions indicating user actions of past users with said training digital object;

- training, at the first stage of training, based on the first set of training digital objects of the machine learning model, to determine the parameter of predicted user actions for the digital object of the use stage, wherein the parameter of predicted user actions indicates the user actions of future users with the digital object of the use stage;

- receiving by the processor a second plurality of training digital objects, wherein each training digital object from the second plurality of training digital objects is associated with (a) a training search query used to form a training digital object from the second plurality of training digital objects, and (b) with a first label indicating the degree of relevance of an object from the second plurality of training digital objects to the training search query;

- training at the second stage of training, following the first stage of training, based on the second set of training digital objects of the machine learning model, to determine the synthesized label of the digital object of the use stage, indicating the degree of relevance of the digital object of the use stage to the search query of the use stage;

- applying a machine learning model by the processor to the first plurality of training digital objects to supplement an object from the first plurality of training digital objects with a synthesized label and thereby forming a first supplemented plurality of training digital objects; and

- training, based on the first augmented set of training digital objects, of a machine learning model to determine the relevance parameter of a digital object of the use stage, indicating the degree of relevance of the digital object of the use stage to the search query of the use stage, wherein the training digital object from the first set of training digital objects contains an indication of a digital document associated with the document metadata, and training of the machine learning model based on the first set of training digital objects at the first training stage additionally includes:

- transformation of document metadata into their text representation containing tokens;

- pre-processing the text representation to mask multiple masked tokens in it; and

- training, based on the first set of training digital objects, a machine learning model to determine a token from several masked tokens based on the context provided by neighboring tokens,

wherein the relevance parameter of the digital object of the use stage additionally indicates the semantic relevance parameter indicating the degree of semantic relevance of the search query of the use stage to the content of the digital object of the use stage.

2. The method according to claim 1, characterized in that the document metadata contains at least one of the following: a training search query associated with an object from the first plurality of training digital objects, a title of the digital document, content of the digital document, and a web address associated with the digital document.

3. The method according to claim 1, characterized in that it further includes determining a parameter of past user actions associated with an object from the first plurality of training digital objects, based on data on clicks of past users.

4. The method according to claim 3, characterized in that the click data comprises data about at least one click of at least one past user made in response to sending a training search query associated with an object from the first plurality of training digital objects.

5. The method according to paragraph 1, characterized in that before training the machine learning model to determine the relevance parameter of the digital object of the use stage, it additionally includes:

- receiving by the processor a third plurality of training digital objects, wherein the object from the third plurality of training digital objects is associated (a) with a training search query used to form the object from the third plurality of training digital objects, and (b) with a second label indicating the degree of relevance of the object from the third plurality of training digital objects to the training search query;

- training at the third stage of training, following the second stage of training, based on the third set of training digital objects of the machine learning model, to determine a refined synthesized label of the digital object of the use stage, indicating the degree of relevance of the digital object of the use stage to the search query of the use stage;

- applying a machine learning model by the processor to the first augmented set of training digital objects to augment an object from the first augmented set of training digital objects with a refined synthesized label and thereby forming a second augmented set of training digital objects; and

- training a machine learning model to determine the relevance parameter of a digital object of the use stage based on the second augmented set of training digital objects.

6. The method according to claim 5, characterized in that the set of the first set of training digital objects, the second set of training digital objects and the third set of training digital objects is at least partially different from any other set of the first set of training digital objects, the second set of training digital objects and the third set of training digital objects.

7. The method according to claim 5, characterized in that the set of the first set of training digital objects, the second set of training digital objects, and the third set of training digital objects is larger in size than the subsequent set of the first set of training digital objects, the second set of training digital objects, and the third set of training digital objects.

8. The method according to paragraph 1, characterized in that after training the machine learning model to determine the relevance parameter of the digital object of the use stage, it additionally includes:

- training, based on the third set of training digital objects, a machine learning model to determine the corresponding refined parameter of relevance of the digital object of the use stage, indicating the degree of relevance of the digital object of the use stage to the search query of the use stage.

9. The method according to claim 8, characterized in that the set of the first set of training digital objects, the second set of training digital objects and the third set of training digital objects is at least partially different from any other set of the first set of training digital objects, the second set of training digital objects and the third set of training digital objects.

10. The method according to claim 8, characterized in that the set of the first set of training digital objects, the second set of training digital objects, and the third set of training digital objects is larger in size than the subsequent set of the first set of training digital objects, the second set of training digital objects, and the third set of training digital objects.

11. The method according to claim 8, characterized in that the third set of training objects is identical to the second set of training digital objects.

12. The method according to paragraph 1, characterized in that at the first stage of training, the machine learning model is trained to determine a rough initial estimate of the relevance parameter of the digital object of the use stage, and at each subsequent stage of training, the machine learning model is trained with the aim of improving the rough initial estimate.

13. The method according to claim 10, characterized in that the improvement of the rough initial estimate is determined using a metric based on a normalized discounted cumulative indicator.

14. The method according to claim 1, characterized in that the machine learning model contains at least one learning model.

15. The method according to claim 14, characterized in that at least one learning model is a learning model based on a transformer.

16. The method according to claim 1, characterized in that the machine learning model contains at least two learning models, wherein the first model of the two learning models is trained to determine a synthesized label for a digital object of the use stage for the purpose of forming a first augmented set of training digital objects, and the second model of the two learning models is trained to determine the relevance parameter of a digital object of the use stage based on the first augmented set of training digital objects.

17. The method according to paragraph 16, characterized in that the first model of the two training models differs from the second model.

18. The method according to claim 16, characterized in that the first model of the two training models is a transformer-based training model.

19. The method according to paragraph 1, characterized in that it additionally includes ranking digital objects of the use stage according to the relevance parameters associated with them.

20. The method according to claim 1, characterized in that it additionally includes ranking digital objects of the use stage based on the relevance parameters associated with them, including the use of another learning model trained to rank digital objects of the use stage using the relevance parameters generated by the machine learning model as input features.

21. The method according to claim 20, characterized in that the other learning model is a learning model based on CatBoost decision trees.

22. A system for training a machine learning model to rank digital objects of the use stage, generated using a search query of the use stage, comprising:

- processor;

- memory associated with the processor; and

- a machine learning training module located in memory, executed by a processor and containing commands, whereby the processor, when executing these commands, is capable of performing the following actions:

- obtaining a first plurality of training digital objects, wherein the training digital object from the first plurality of training digital objects is associated with a parameter of past user actions indicating user actions of past users with the training digital object from the first plurality of training digital objects;

- obtaining a second set of training digital objects, wherein the training digital object from the second set of training digital objects is associated with (a) a training search query used to form an object from the second set of training digital objects, and (b) with a first label indicating the degree of relevance of the object from the second set of training digital objects to the training search query;

- applying a machine learning model to a first set of training digital objects to supplement an object from the first set of training digital objects with a synthesized label and thereby forming a first supplemented set of training digital objects; and

- training, based on the first augmented set of training digital objects, a machine learning model to determine the relevance parameter of a digital object of the use stage, indicating the degree of relevance of the digital object of the use stage to the search query of the use stage, wherein the training digital object from the first set of training digital objects contains an indication of a digital document associated with the document metadata, the training machine learning module contains additional commands, and the processor, when executing these commands, is capable of training the machine learning model at the first training stage based on the first set of training digital objects by:

- converting document metadata into its text representation containing tokens;

- training, based on a first set of training digital objects, a machine learning model to determine a token from several masked tokens based on the context provided by neighboring tokens,

23. The system according to item 22, characterized in that the machine learning training module contains additional commands, and the processor, when executing these commands, is capable of performing the following actions before training the machine learning model to determine the relevance parameter of the digital object of the use stage:

- obtaining a third set of training digital objects, wherein an object from the third set of training digital objects is associated (a) with a training search query used to form an object from the third set of training digital objects, and (b) with a second label indicating the degree of relevance of an object from the third set of training digital objects to the training search query;

- applying a machine learning model to a first augmented set of training digital objects to augment an object from the first augmented set of training digital objects with a refined synthesized label and thereby forming a second augmented set of training digital objects; and