WO2021159640A1

WO2021159640A1 - Drug recommendation method based on artificial intelligence, and related device

Info

Publication number: WO2021159640A1
Application number: PCT/CN2020/093297
Authority: WO
Inventors: 陈娴娴; 阮晓雯; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-02-13
Filing date: 2020-05-29
Publication date: 2021-08-19
Anticipated expiration: 2022-08-13
Also published as: CN111260448A; CN111260448B

Abstract

Provided are a drug recommendation method and system based on artificial intelligence, a computer device, and a computer-readable storage medium. The method comprises the following steps: collecting historical medical data to construct an original profile (S10); pre-processing the original profile and outputting a feature data set (S20); by taking the pre-processed feature data set as training data, respectively performing pre-training to obtain an XGBOOST recommendation model and a deep neural network recommendation model (S30); obtaining a first recommendation strength probability value and a second recommendation strength probability value by means of the XGBOOST recommendation model and the deep neural network recommendation model; linearly adding the first recommendation strength probability value and the second recommendation strength probability value of each target drug to obtain each target recommendation strength probability value (S70); and according to target recommendation strength probability values, screening out a preset number of pieces of top-ranked target drug information, and pushing the target drug information to a user. According to the method, model results of the two models are linearly added, so that the drug recommendation accuracy can be effectively improved.

Description

Artificial intelligence-based drug recommendation method and related equipment

本申请要求于2020年2月13日提交中国专利局、申请号为CN2020100906810、发明名称为“基于人工智能的药品推荐方法及相关设备”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 13, 2020, the application number is CN2020100906810, and the invention title is "artificial intelligence-based drug recommendation method and related equipment", the entire content of which is incorporated by reference In this application.

Technical field

本申请涉及智慧医药领域，尤其涉及一种基于人工智能的药品推荐方法、系统、计算机设备及计算机可读存储介质。This application relates to the field of smart medicine, and in particular to an artificial intelligence-based drug recommendation method, system, computer equipment, and computer-readable storage medium.

Background technique

现有的患者在购买药品时，通常是选择去连锁药店等实体医药零售店进行购买，具体是患者将自己的症状描述给药师，药师选择相应的药品给患者，但是这种方式存在以下缺陷，药店内的执业药师的水平可能参差不齐，进而导致推荐的药品并不一定适合患者，另外各实体店需配备多名药师进而使人力成本急速上升，因此现在涌现出许多能给使用者推荐药品的系统，现有的大多数荐药系统基本上都是基于药品自身适应症来与药品适应症进行匹配，理论上确实可行，但忽略了大量专业化的药历数据。When existing patients buy medicines, they usually choose to go to physical medicine retail stores such as chain pharmacies. Specifically, the patients describe their symptoms to the pharmacist, and the pharmacist chooses the corresponding medicines for the patient. However, this method has the following shortcomings: The level of licensed pharmacists in pharmacies may be uneven, resulting in that the recommended drugs are not necessarily suitable for patients. In addition, physical stores need to be equipped with multiple pharmacists, which causes labor costs to rise rapidly. Therefore, there are many drugs that can be recommended to users. Most of the existing drug recommendation systems are basically based on the drug's own indications to match the drug indications. It is theoretically feasible, but it ignores a large amount of specialized drug history data.

technical problem

发明人意识到，药历数据的完整性和有效性将决定药品推荐的准确率，因此，如何提高药品推荐的精确率，是目前所需解决的问题。The inventor realizes that the completeness and validity of the medical history data will determine the accuracy of drug recommendation. Therefore, how to improve the accuracy of drug recommendation is a problem that needs to be solved at present.

Technical solutions

本申请的目的是提供一种基于人工智能的药品推荐方法、系统、计算机设备及计算机可读存储介质，可有效提高药品推荐的精确率。The purpose of this application is to provide an artificial intelligence-based drug recommendation method, system, computer equipment, and computer-readable storage medium, which can effectively improve the accuracy of drug recommendation.

为实现所述目的，本申请提供一种基于人工智能的药品推荐方法，其包括以下步骤：In order to achieve the objective, this application provides an artificial intelligence-based drug recommendation method, which includes the following steps:

采集历史医疗数据以构建原始画像；Collect historical medical data to construct original portraits;

对原始画像进行预处理并输出特征数据集；Preprocess the original image and output the feature data set;

经过预处理的特征数据集作为训练数据，分别预先训练得到XGBOOST推荐模型和深度神经网络推荐模型；The pre-processed feature data set is used as the training data, and the XGBOOST recommendation model and the deep neural network recommendation model are pre-trained respectively;

获取用户的药品请求信息并输入到XGBOOST推荐模型和深度神经网络推荐模型中；Obtain the user's drug request information and input it into the XGBOOST recommendation model and the deep neural network recommendation model;

XGBOOST推荐模型根据药品请求信息计算获得所述药品请求信息对应的目标药品的第一推荐强度概率值；The XGBOOST recommendation model calculates and obtains the first recommendation intensity probability value of the target drug corresponding to the drug request information according to the drug request information;

深度神经网络推荐模型根据药品请求信息计算获得所述药品请求信息对应的目标药品的第二推荐强度概率值；The deep neural network recommendation model calculates and obtains the second recommendation intensity probability value of the target drug corresponding to the drug request information according to the drug request information;

将各所述目标药品的第一推荐强度概率值与第二推荐强度概率值进行线性相加以获得各所述目标药品的目标推荐强度概率值；Linearly adding the first recommended intensity probability value and the second recommended intensity probability value of each of the target drugs to obtain the target recommended intensity probability value of each of the target drugs;

将各所述目标药品的目标药品信息按照对应的目标推荐强度概率值从高到低排序，筛选出排名前预设数量的目标药品信息，并将所述目标药品信息推送至所述用户。The target drug information of each target drug is sorted according to the corresponding target recommendation intensity probability value from high to low, the top preset number of target drug information is screened out, and the target drug information is pushed to the user.

本申请还提供一种基于人工智能的药品推荐系统，其包括：This application also provides an artificial intelligence-based drug recommendation system, which includes:

采集模块，其用于采集历史医疗数据以构建原始画像；Collection module, which is used to collect historical medical data to construct original portraits;

预处理模块，其用于对原始画像进行预处理并输出特征数据集；A preprocessing module, which is used to preprocess the original image and output a feature data set;

模型训练模块，其用于将经过预处理的特征数据集作为训练数据，分别预先训练得到XGBOOST推荐模型和深度神经网络推荐模型；Model training module, which is used to pre-train the pre-processed feature data set as training data to obtain the XGBOOST recommendation model and the deep neural network recommendation model;

输入模块，其用于获取用户的药品请求信息并输入到XGBOOST推荐模型和深度神经网络推荐模型中；Input module, which is used to obtain the user's drug request information and input it into the XGBOOST recommendation model and the deep neural network recommendation model;

处理模块，其用于通过XGBOOST推荐模型计算取得所述药品请求信息对应的目标药品的第一推荐强度概率值，通过深度神经网络推荐模型计算取得所述药品请求信息对应的目标药品的第二推荐强度概率值；并将各所述目标药品的第一推荐强度概率值与第二推荐强度概率值进行线性相加以获得各所述目标药品的目标推荐强度概率值；The processing module is used to calculate and obtain the first recommendation strength probability value of the target drug corresponding to the drug request information through the XGBOOST recommendation model, and obtain the second recommendation of the target drug corresponding to the drug request information through the deep neural network recommendation model Intensity probability value; and linearly adding the first recommended intensity probability value and the second recommended intensity probability value of each of the target drugs to obtain the target recommended intensity probability value of each of the target drugs;

输出模块，其用于将各所述目标药品的目标药品信息按照对应的目标推荐强度概率值从高到低排序，筛选出排名前预设数量的目标药品信息，并将所述目标药品信息推送至所述用户。The output module is used to sort the target drug information of each target drug according to the corresponding target recommendation intensity probability value from high to low, filter out the preset number of target drug information in the top ranking, and push the target drug information To the user.

本申请还提供一种计算机设备，所述计算机设备，包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机可读指令，所述处理器执行所述计算机可读指令时实现基于人工智能的药品推荐方法的步骤：The present application also provides a computer device that includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor. The processor implements the computer-readable instructions when the processor executes the computer-readable instructions. Steps of artificial intelligence-based drug recommendation method:

本申请又提供一种计算机可读存储介质，其上存储有计算机可读指令，所述计算机可读指令被处理器执行时实现基于人工智能的药品推荐方法的步骤：This application also provides a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the steps of the method for recommending drugs based on artificial intelligence are implemented:

将各所述目标药品的目标药品信息按照对应的目标推荐强度概率值从高到低排序，筛选出排名前预设数量的目标药品信息，并将所述目标药品信息推送至所述用户。The target drug information of each target drug is sorted according to the corresponding target recommendation intensity probability value from high to low, and the preset number of target drug information in the top ranking is screened out, and the target drug information is pushed to the user.

Beneficial effect

本申请将XGBOOST推荐模型与深度神经网络推荐模型进行组合，将两者的模型结果进行线性相加，推荐的精确率比非模型的精确率提升了200%，且排名前三的正确药品推荐率从75%提升到了95%。本申请对原始画像中的历史医疗数据通过特征扩充提取出更多的隐含信息以确保特征值的完整性，利用饱和度探查筛选删除不具训练意义的特征值，从而形成完整且有效的特征数据集。此外，本申请将XGBOOST推荐模型与深度神经网络推荐模型组合而成，自动化程度较高，且可自适应训练和预测，提升了药品推荐的效率。This application combines the XGBOOST recommendation model and the deep neural network recommendation model, and linearly adds the results of the two models. The accuracy of the recommendation is 200% higher than that of the non-model, and the recommendation rate of the top three correct drugs Increased from 75% to 95%. This application extracts more hidden information from the historical medical data in the original portrait through feature expansion to ensure the integrity of feature values, and uses saturation exploration to filter and delete feature values that are not meaningful for training, thereby forming complete and effective feature data set. In addition, this application combines the XGBOOST recommendation model and the deep neural network recommendation model, which has a high degree of automation, and can be adaptively trained and predicted, which improves the efficiency of drug recommendation.

Description of the drawings

图1为本申请基于人工智能的药品推荐方法的流程图；Figure 1 is a flow chart of the application of the artificial intelligence-based drug recommendation method;

图2为图1中步骤S10的历史医疗数据涉及非标准药品名称时的归类处理的流程图；FIG. 2 is a flowchart of the classification processing when the historical medical data in step S10 in FIG. 1 involves non-standard drug names;

图3为本申请基于人工智能的药品推荐系统的模块图；Figure 3 is a block diagram of the artificial intelligence-based drug recommendation system of this application;

图4为本申请基于人工智能的药品推荐方法的计算机设备的硬件结构示意图。Fig. 4 is a schematic diagram of the hardware structure of the computer equipment of the artificial intelligence-based drug recommendation method of the present application.

附图标记：Reference signs:

1、基于人工智能的药品推荐系统 10、采集模块 20、预处理模块1. A drug recommendation system based on artificial intelligence 10. Acquisition module 20. Pre-processing module

30、模型训练模块 40、输入模块 50、处理模块30. Model training module 40. Input module 50. Processing module

60、输出模块 2、计算机设备 21、存储器 22、处理器60. Output module 2. Computer equipment 21. Memory 22. Processor

Embodiments of the present invention

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本申请，并不用于限定本申请。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purpose, technical solutions, and advantages of this application clearer and clearer, the following further describes the application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

请参阅图1所示，本申请基于人工智能的药品推荐方法中包括预先训练形成药品推荐模型的过程和依据所述药品推荐模型进行药品推荐的过程，其包括以下步骤：As shown in Figure 1, the artificial intelligence-based drug recommendation method of this application includes the process of pre-training to form a drug recommendation model and the process of performing drug recommendation based on the drug recommendation model, which includes the following steps:

步骤S10：采集历史医疗数据以构建原始画像；Step S10: Collect historical medical data to construct an original portrait;

步骤S20：对原始画像进行预处理并输出特征数据集；Step S20: preprocess the original image and output a feature data set;

步骤S30：经过预处理的特征数据集作为训练数据，分别预先训练得到XGBOOST推荐模型和深度神经网络推荐模型；Step S30: The pre-processed feature data set is used as training data, and the XGBOOST recommendation model and the deep neural network recommendation model are obtained by pre-training respectively;

步骤S40：获取用户的药品请求信息并输入到XGBOOST推荐模型和深度神经网络推荐模型中；Step S40: Obtain the user's drug request information and input it into the XGBOOST recommendation model and the deep neural network recommendation model;

步骤S50：XGBOOST推荐模型根据药品请求信息计算获得所述药品请求信息对应的目标药品的第一推荐强度概率值；Step S50: The XGBOOST recommendation model calculates and obtains the first recommendation intensity probability value of the target drug corresponding to the drug request information according to the drug request information;

步骤S60：深度神经网络推荐模型根据药品请求信息计算获得所述药品请求信息对应的目标药品的第二推荐强度概率值；Step S60: The deep neural network recommendation model calculates and obtains the second recommendation strength probability value of the target drug corresponding to the drug request information according to the drug request information;

步骤S70：将各所述目标药品的第一推荐强度概率值与第二推荐强度概率值进行线性相加以获得各所述目标药品的目标推荐强度概率值；Step S70: linearly add the first recommended strength probability value and the second recommended strength probability value of each target drug to obtain the target recommended strength probability value of each target drug;

步骤S80：将各所述目标药品的目标药品信息按照对应的目标推荐强度概率值从高到低排序，筛选出排名前预设数量的目标药品信息，并将所述目标药品信息推送至所述用户。Step S80: Sort the target drug information of each target drug according to the corresponding target recommendation intensity probability value from high to low, filter out the preset number of target drug information in the top ranking, and push the target drug information to the user.

请参阅图2所示，图2为图1中步骤S10的历史医疗数据涉及非标准药品名称时的归类处理的流程图，若历史医疗数据中的药品名称并不是严格的国家标准ICD10编码，无法准确地进行药品的定位和归类。本申请通过可读性介质的文本摘取匹配方法对历史医疗数据中的非标准药品名称进行定位与匹配，以利于对非标准药品名称准确地进行药品的定位和归类。Please refer to Figure 2. Figure 2 is a flow chart of the classification process when the historical medical data in step S10 in Figure 1 involves non-standard drug names. If the drug name in the historical medical data is not strictly a national standard ICD10 code, Unable to accurately locate and classify medicines. This application uses the text extraction and matching method of the readable medium to locate and match the non-standard drug names in the historical medical data, so as to facilitate the accurate positioning and classification of the non-standard drug names.

非标准药品名称的归类包括以下步骤：The classification of non-standard drug names includes the following steps:

步骤S110：对非标准药品名称与标准药品名称分别进行分字处理。Step S110: Separate the names of non-standard drugs and standard drugs.

步骤S120：基于Word2Vec浅层神经网络，并基于skip_gram算法对非标准药品名称与标准药品名称中的字分别进行字向量的转化；例如，通过Word2vec的网络训练，将“感”这一个字转化成了向量[1.90334,2.9874,…,0.988]，维度5×1。Step S120: Based on the Word2Vec shallow neural network, and based on the skip_gram algorithm, the words in the non-standard drug name and the standard drug name are respectively converted into word vectors; for example, the word "Sense" is transformed into word2vec through the network training of Word2vec The vector [1.90334,2.9874,...,0.988] has a dimension of 5×1.

步骤S130：运用池化的法则纵向进行向量的拼接，以获得非标准药品名称与标准药品名称的词向量拼接；例如，感冒药就可以转化成向量[1.0334,10.51184,…,1.8115]，维度15×1。Step S130: Use the pooling rule to splice the vectors vertically to obtain the word vector splicing of non-standard drug names and standard drug names; for example, cold medicine can be transformed into a vector [1.0334,10.51184,...,1.8115], dimension 15 ×1.

步骤S140：对非标准药品名称与标准药品名称进行向量距离计算。Step S140: Calculate the vector distance between the non-standard drug name and the standard drug name.

步骤S150：找出与非标准药品名称的向量距离最相近的标准药品名称。Step S150: Find the standard drug name with the closest vector distance to the non-standard drug name.

步骤S160：将非标准药品名称按照向量距离最相近的标准药品名称的归类标准进行归类处理。Step S160: Classify the non-standard drug names according to the classification criteria of the standard drug names with the closest vector distance.

于本实施例中，步骤S10中的原始画像包括基础信息与病历信息，原始画像的数据来源包括多个不同的数据来源，且多个不同的数据来源所包括的多维度特征不一致。In this embodiment, the original portrait in step S10 includes basic information and medical history information. The data source of the original portrait includes multiple different data sources, and the multi-dimensional features included in the multiple different data sources are not consistent.

基础信息包括是药品购买者或就医人员的性别、年龄、血型、婚姻、手机号、等等字段信息；The basic information includes the gender, age, blood type, marriage, mobile phone number, etc. of the drug purchaser or medical staff;

病历信息包括病历概要如目前症状、症状持续时间、症状严重情况、手术史、医疗费用、药品、药品剂型、以及住院天数、住院医嘱等。流感、水痘、手足口等急性病的病历信息、对应发病时间、用药等；高血压、糖尿病、慢阻肺等慢性疾病信息的随访信息，包括患病持续时间、用药等。The medical record information includes a summary of the medical record such as current symptoms, duration of symptoms, severe symptoms, history of surgery, medical expenses, drugs, drug formulations, and hospitalization days, hospital orders, etc. Medical history information, corresponding time of onset, medications, etc. of acute diseases such as influenza, chickenpox, hand, foot and mouth; follow-up information of chronic diseases such as hypertension, diabetes, and chronic obstructive pulmonary disease, including duration of illness, medications, etc.

不同的数据来源包含的多维度特征不一致，例如，哨点医院的多维度特征会比较丰富和充足，通常包括性别、年龄、血型、婚姻、舒张压、收缩压、各类体检、抽验指标等，中小型门诊、社区医院等可能就只有基础信息；品牌药店就更少了，可能只有用户手机号、药品、药品剂型等。Different data sources contain inconsistent multi-dimensional features. For example, sentinel hospitals will have richer and sufficient multi-dimensional features, usually including gender, age, blood type, marriage, diastolic blood pressure, systolic blood pressure, various physical examinations, sampling indicators, etc. Small and medium-sized outpatient clinics, community hospitals, etc. may only have basic information; there are even fewer brand-name pharmacies, which may only have user phone numbers, drugs, and drug formulations.

步骤S10中采集历史医疗数据的同时，通过关联提取法将历史医疗数据中的数据与关联数据库中的数据建立表对应关系，并采集得到关联数据库中的关联数据以构建原始画像，其中，关联数据库为医疗领域以外的数据库，可为保险、金融、科技等多个领域的数据库，关联数据库可为内部数据库或外部免费数据库（例如MySQL数据库），并通过多表主键关联提取法对原始画像中的信息进行多维度的扩充和补缺，以构建更加完善的原始画像。其中，所述主键就是ID，例如经过数据脱敏后的人员社保账号/身份证号。While collecting historical medical data in step S10, establish a table correspondence between the data in the historical medical data and the data in the associated database through the association extraction method, and collect the associated data in the associated database to construct the original portrait, where the associated database It is a database other than the medical field, and can be a database in multiple fields such as insurance, finance, and technology. The related database can be an internal database or an external free database (such as MySQL database). The information is expanded and filled in multiple dimensions to construct a more complete original portrait. Wherein, the primary key is an ID, such as a social security account number/ID number of a person after data desensitization.

以关联数据为金融领域的数据为例，通过对金融领域关联数据库的数据进行分析和挖掘，建立用户的消费能力特征维度，从而扩充个人画像的消费能力特征维度，若金融领域关联数据库的数据中包括未参加金融衍生品的理财、未购买保险，则建立用户的消费能力特征维度--消费能力低；若金融领域关联数据库的数据中包括参加过很多金融衍生品的理财、购买过多种保险，则建立用户的消费能力特征维度--消费能力高，所述用户的消费能力特征维度作为训练数据用于训练得到XGBOOST推荐模型和深度神经网络推荐模型。消费能力低的训练数据所训练得到的XGBOOST推荐模型和深度神经网络推荐模型所推荐的药品与其消费能力相对应，即推荐便宜但效果较慢的药品；消费能力高的训练数据所训练得到的XGBOOST推荐模型和深度神经网络推荐模型所推荐的药品与其消费能力相对应，即推荐贵且效果显著的药品。Taking linked data as the data in the financial field as an example, by analyzing and mining the data in the financial field linked database, the user's consumption capacity characteristic dimension is established, thereby expanding the consumption capacity characteristic dimension of the personal portrait. If the financial domain related database data is Including those who did not participate in financial derivative products, and did not purchase insurance, then establish the user's consumption power characteristic dimension-low consumption power; if the data in the financial sector related database includes participating in the financial management of many financial derivatives, and purchase a variety of insurance , The user’s spending power feature dimension—high spending power is established, and the user’s spending power feature dimension is used as training data for training to obtain the XGBOOST recommendation model and the deep neural network recommendation model. The medicines recommended by the XGBOOST recommendation model and the deep neural network recommendation model trained on the training data with low spending power correspond to their spending power, that is, recommend cheap but slower drugs; XGBOOST trained on the training data with high spending power The drugs recommended by the recommendation model and the deep neural network recommendation model correspond to their spending power, that is, expensive and effective drugs are recommended.

步骤S10中采集历史医疗数据的同时，还采集其它关联数据，并通过多表主键关联提取法对原始画像中的信息进行多维度的扩充和补缺。While collecting historical medical data in step S10, other related data is also collected, and the information in the original portrait is expanded and filled in multiple dimensions through the multi-table primary key association extraction method.

于本于本实施例中，步骤S20中的预处理包括特征扩充、饱和度探查筛选、缺失值填补、数据集异构与扩充、回馈选择以及相关性筛选。In this embodiment, the preprocessing in step S20 includes feature expansion, saturation detection and screening, missing value filling, data set heterogeneity and expansion, feedback selection, and correlation screening.

特征扩充的具体实施方式如下，用tsfresh算法进行扩充，计算了许多中间统计变量譬如方差、标准差、极值、各类均值等等。譬如每两列特征f1，f2进行了方差、标准差、极值、各类均值等等统计指标的计算。扩充之后会存在一些列数据的缺失，或者分布不明显的特征譬如整个列都是0或者-1，这样的特征列没有区分性，输入药品推荐模型没有意义，将会对这样的数据列进行删除。The specific implementation of feature expansion is as follows. The tsfresh algorithm is used to expand, and many intermediate statistical variables such as variance, standard deviation, extreme value, various mean values and so on are calculated. For example, for every two columns of features f1 and f2, statistical indicators such as variance, standard deviation, extreme value, and various mean values are calculated. After expansion, there will be missing data in some columns, or features that are not clearly distributed, such as the entire column is 0 or -1. Such feature columns are indistinguishable, and it is meaningless to enter the drug recommendation model. Such data columns will be deleted. .

于本实施例中，所述原始画像包括n×m维度矩阵的特征，在对原始画像进行预处理时用tsfresh算法将n×m维度矩阵的特征简化为若干n×1维度向量的横向拼接，例如{H_1,H_2,

,H_m }，再进行粒子扫描，所述粒子扫描包括单维度粒子扫描与多维度粒子扫描。In this embodiment, the original portrait includes the features of an n×m dimensional matrix. When preprocessing the original portrait, the tsfresh algorithm is used to simplify the features of the n×m dimensional matrix into a horizontal splicing of several n×1 dimensional vectors. For example {H_1,H_2,

,H_m }, and then perform particle scanning, which includes single-dimensional particle scanning and multi-dimensional particle scanning.

于本实施例中，在对原始画像进行预处理时，所述单维度粒子扫描通过滑动扫描以衍生出单维度扩充特征集L1, 所述多维度粒子扫描通过滑动扫描以衍生出多维度扩充特征集L2, 再将单维度扩充特征集、多维度扩充特征集与原始画像进行拼接得到蕴含更多信息量的特征数据集。In this embodiment, when the original image is preprocessed, the single-dimensional particle scan uses a sliding scan to derive a single-dimensional expanded feature set L1, and the multi-dimensional particle scan uses a sliding scan to derive a multi-dimensional expanded feature Set L2, and then combine the single-dimensional extended feature set, the multi-dimensional extended feature set and the original image to obtain a feature data set containing more information.

以单维度粒子扫描为例，上述的每一个H都是一个n×1维度的向量。Taking a single-dimensional particle scan as an example, each of the above H is a vector of n×1 dimensions.

首先，定义一个window_size为k的a×1的扫描粒子进行扫描。在对单维度上进行扫描时，应用window_size为k的粒子a×1进行滑动扫描。每滑动到对应单维度上时，将粒子在单维度上得到的扫描数据a

×1进行函数计算，该函数可以为各类统计指标如统计指标如方差、标准差、极值、均值等函数，也可以是tanh、relu等非线性函数。计算得到粒子扫描值。然后，进行窗口滑动，再进行扫描，然后再进行窗口滑动扫描循环下去直至整个单列扫描完毕，则每一个单列特征可以衍生出多列方差、标准差、极值等等多维度的特征。将衍生出来的特征concat成一个单维度扩充特征集L1。原始的特征画像中的特征结合衍生出来的单维度扩充特征集形成预处理后的特征数据集。First, define an a×1 scanning particle with a window_size of k for scanning. When scanning in a single dimension, a particle a×1 with a window_size of k is used for sliding scanning. When each sliding to the corresponding single dimension, the scanning data obtained by the particle in the single dimension a

×1 for function calculation, the function can be various statistical indicators such as statistical indicators such as variance, standard deviation, extreme value, mean value and other functions, or non-linear functions such as tanh and relu. Calculate the particle scan value. Then, window sliding is performed, scanning is performed, and then the window sliding scanning cycle is performed until the entire single column is scanned. Then, each single column feature can derive multi-dimensional features such as multi-column variance, standard deviation, extreme value, and so on. Concat the derived features into a single-dimensional extended feature set L1. The features in the original feature portrait are combined with the derived single-dimensional extended feature set to form a preprocessed feature data set.

以多维度粒子扫描为例，可以取C为一个list作为备选维度。然后从C中循环取出c1，c2等进行操作。假设本次取出c2，则将循环对c2列特征进行粒子形态为d×c2的扫描，同样定义一个新的window_size_2，在c2维度向量上滑动扫描。同样将扫描得到的数据进行了函数计算。Taking multi-dimensional particle scanning as an example, C can be taken as a list as an alternative dimension. Then take out c1, c2, etc. from C in a loop to operate. Assuming that c2 is taken out this time, the feature of column c2 will be scanned with a particle shape of d×c2 in a loop, and a new window_size_2 will also be defined to slide and scan on the c2 dimension vector. The data obtained by scanning is also calculated by function.

最终每一组c2特征都可以衍生出一组衍生特征。而可以循环对所有特征进行c2扫描。此外，可以取到C中不同的小c如c1，c3,…进行扫描。最终将多粒子扫描得到的画像concat成一个多维度扩充特征集L2。最终将L1，L2与原始画像进行拼接得到蕴含更多信息量的画像W。原始的特征画像中的特征结合衍生出来的多维度扩充特征集形成预处理后的特征数据集。In the end, each set of c2 features can derive a set of derivative features. But it is possible to perform c2 scanning on all features in a loop. In addition, you can take different small cs in C, such as c1, c3,... for scanning. Finally, concat the image obtained by multi-particle scanning into a multi-dimensional extended feature set L2. Finally, L1, L2 and the original portrait are spliced together to obtain a portrait W that contains more information. The features in the original feature portrait are combined with the derived multi-dimensional extended feature set to form a preprocessed feature data set.

饱和度探查筛选的具体实施方式如下，对某一维度特征进行探查筛选，以从原始画像中删除饱和度很小的维度特征。以对年龄进行探查筛选为例，假设总共100个人，70个人有年龄的数据特征记录，30个人没有，则年龄这列数据的饱和度为70%，饱和度很小的特征会酌情删除，因为缺失过于严重无法保留有效信息。The specific implementation of the saturation detection and screening is as follows. A certain dimensional feature is searched and filtered to delete the dimensional feature with low saturation from the original image. Take the age search and filter as an example. Assuming that there are a total of 100 people, 70 people have age data feature records, and 30 people do not, the saturation of the age column data is 70%, and the features with low saturation will be deleted as appropriate, because The missing is too serious to retain valid information.

缺失值填补的具体实施方式如下，基于非线性插值法、rpart法进行分列缺失值填补，并通过结果回溯进行缺失值填补方案选取。The specific implementation of the missing value filling is as follows. The missing value filling is performed based on the nonlinear interpolation method and the rpart method, and the missing value filling scheme is selected through the result backtracking.

数据集异构、扩充的具体实施方式如下，以基于可读介质的文本提取进行数据集异构、扩充为例，假设100万人的患病史这列特征里面有糖尿病、高血压等等疾病，那这么多文本信息堆积在这一列特征里面无法让模型直接获取有效的文本介质。因此将这列特征里面有糖尿病的抽取介质成新的一列特征，这100万人中有糖尿病则在新的这列特征标记为1，没有糖尿病则在这列特征标记为0；以此类推其他病种及其他的情况。The specific implementation of data set heterogeneity and expansion is as follows. Take the data set heterogeneity and expansion based on text extraction from readable media as an example. It is assumed that there are diseases such as diabetes, hypertension, etc. in the history of 1 million people. , The accumulation of so much text information in this column of features cannot allow the model to directly obtain an effective text medium. Therefore, the extraction medium with diabetes in this list of features is made into a new list of features. If there are diabetes in the 1 million people, the new list of features will be marked as 1, and no diabetes will be marked as 0 in this list of features; and so on. Types of diseases and other conditions.

于本实施例中，步骤S30中预先训练得到XGBOOST推荐模型和深度神经网络推荐模型的具体实施方式：先根据步骤S20的特征数据集训练得到初始深度神经网络推荐模型与初始XGBOOST推荐模型，经测试数据集测试及优化后获得优化后的深度神经网络推荐模型与XGBOOST推荐模型。In this embodiment, the specific implementation of the XGBOOST recommendation model and the deep neural network recommendation model obtained by pre-training in step S30: first train according to the feature data set of step S20 to obtain the initial deep neural network recommendation model and the initial XGBOOST recommendation model, after testing After data set testing and optimization, the optimized deep neural network recommendation model and XGBOOST recommendation model are obtained.

XGBOOST推荐模型和深度神经网络推荐模型的输出数据是待推荐药品及对应的概率值。The output data of the XGBOOST recommendation model and the deep neural network recommendation model are the drugs to be recommended and the corresponding probability values.

所述优化是指在步骤S20的预处理完成之后，搭建好XGBOOST推荐模型和深度神经网络推荐模型，根据计算资源和时间调整两个模型的参数和超参数，给参数和超参数设置阈值，限定其上限与下限，防止进入死循环。The optimization means that after the preprocessing of step S20 is completed, the XGBOOST recommendation model and the deep neural network recommendation model are built, the parameters and hyperparameters of the two models are adjusted according to computing resources and time, and thresholds are set for the parameters and hyperparameters to limit The upper and lower limits prevent entering an endless loop.

将步骤S20的特征数据集作为训练数据，训练得到初始深度神经网络推荐模型。其中，初始深度神经网络推荐模型为5层神经网络模型，其包括全连接层，所述全连接层包含256个神经元、128个神经元以及64个神经元，并在全连接层都加入Dropout层。初始深度神经网络推荐模型通过反向传播算法求偏导，求偏导遵循链式法则；并通过TensorFlow深度学习框架进行持续的网络训练和调参，最终获得优化后的深度神经网络推荐模型。将步骤S20的特征数据集作为训练数据，训练得到初始XGBOOST推荐模型，初始XGBOOST推荐模型通过自带的feature importance进行进一步的特征因子重要性排序，并筛除权重较低的因子，最终获得优化后的XGBOOST推荐模型。The feature data set of step S20 is used as training data, and an initial deep neural network recommendation model is obtained through training. Among them, the initial deep neural network recommendation model is a 5-layer neural network model, which includes a fully connected layer. The fully connected layer contains 256 neurons, 128 neurons, and 64 neurons, and Dropout is added to the fully connected layer. Floor. The initial deep neural network recommendation model uses the back propagation algorithm to find the partial derivative, which follows the chain rule; and through the TensorFlow deep learning framework for continuous network training and parameter adjustment, the optimized deep neural network recommendation model is finally obtained. Use the feature data set of step S20 as the training data to train to obtain the initial XGBOOST recommendation model. The initial XGBOOST recommendation model uses its built-in feature importance to further rank the importance of feature factors, and filter out the factors with lower weights, and finally obtain the optimized Recommended model of XGBOOST.

于本实施例中，步骤S40中的用户的药品请求信息可通过多种方式获取，语音输入后转换成标准格式的文本信息或代码，病历扫描识别后转换成标准格式的文本信息或代码，或手动输入关键栏位信息后转换成标准格式的文本信息或代码，转换后的标准格式的文本信息或代码作为药品请求信息。用户通过系统输入的药品请求信息包括年龄、症状、病史等，不同的用户输入的药品请求信息不完全相同。例如，第一用户的药品请求信息包括年龄3岁，第一症状为发烧39度四天，第二症状为干咳两天，第三症状为流浓涕两天。第二用户的药品请求信息包括年龄65岁，症状为头晕，病史为高血压十年，服药十年。In this embodiment, the user's drug request information in step S40 can be obtained in a variety of ways, the voice input is converted into text information or codes in a standard format, and the medical records are scanned and recognized and converted into text information or codes in a standard format, or After manually inputting key field information, it is converted into standard format text information or code, and the converted standard format text information or code is used as drug request information. The drug request information entered by the user through the system includes age, symptoms, medical history, etc. The drug request information entered by different users is not completely the same. For example, the drug request information of the first user includes age 3 years old, the first symptom is a fever of 39 degrees for four days, the second symptom is a dry cough for two days, and the third symptom is a runny nose for two days. The medicine request information of the second user includes age 65, dizziness, medical history of hypertension for ten years, and medication for ten years.

于本实施例中，步骤S50中不同的药品请求信息通过XGBOOST推荐模型计算获得不同的第一推荐强度概率值。例如，XGBOOST推荐模型根据所述用户的药品请求信息计算获得药品库中每一药品的第一推荐强度概率值，分别为第一药品，0.19；第二药品，0.55；第三药品，0.97；……。In this embodiment, different drug request information in step S50 obtains different probability values of the first recommendation intensity through the calculation of the XGBOOST recommendation model. For example, the XGBOOST recommendation model calculates and obtains the first recommended strength probability value of each drug in the drug library according to the user's drug request information, which are 0.19 for the first drug; 0.55 for the second drug; 0.55 for the third drug, 0.97;... ….

于本实施例中，步骤S60中不同的药品请求信息通过深度神经网络推荐模型计算获得不同的第二推荐强度概率值。例如，深度神经网络推荐模型根据所述用户的药品请求信息计算获得药品库中每一药品的第二推荐强度概率值，分别为第一药品，0.89；第二药品，0.15；第三药品，0.07；……。In this embodiment, different medicine request information in step S60 is calculated by the deep neural network recommendation model to obtain different second recommendation strength probability values. For example, the deep neural network recommendation model calculates the second recommended strength probability value of each drug in the drug library according to the user's drug request information, which are 0.89 for the first drug; 0.15 for the second drug; and 0.07 for the third drug. ;.......

于本实施例中，步骤S70中的第一推荐强度概率值与第二推荐强度概率值按照权重占比进行线性相加，其中，权重占比通过线性回归(Linear Regression)进行调参，最终获取了组合模型的目标推荐强度概率值。假设，XGBOOST推荐模型的权重占比为第一权重占比，深度神经网络推荐模型的权重占比为第二权重占比，第一权重占比+第二权重占比=1。In this embodiment, the first recommended strength probability value and the second recommended strength probability value in step S70 are linearly added according to the weight ratio, where the weight ratio is adjusted through linear regression, and finally obtained The target recommended strength probability value of the combined model is calculated. Suppose that the weight ratio of the XGBOOST recommendation model is the first weight ratio, the weight ratio of the deep neural network recommendation model is the second weight ratio, and the first weight ratio + the second weight ratio=1.

第一药品的目标推荐强度概率值=第一药品的第一推荐强度概率值×第一权重占比+第一药品的第二推荐强度概率值×第二权重占比。The probability value of the target recommended strength of the first drug = the probability value of the first recommended strength of the first drug x the proportion of the first weight + the probability value of the second recommended strength of the first drug x the proportion of the second weight.

第二药品的目标推荐强度概率值=第二药品的第一推荐强度概率值×第一权重占比+第二药品的第二推荐强度概率值×第二权重占比。The probability value of the target recommended strength of the second drug=the probability value of the first recommended strength of the second drug×the proportion of the first weight+the probability value of the second recommended strength of the second drug×the proportion of the second weight.

以此类推，计算得到药品库中所有的药品的目标推荐强度概率值。By analogy, the probability value of the target recommended strength of all drugs in the drug library is calculated.

于本实施例中，步骤S80的具体筛选方式可为：通过对目标推荐强度概率值倒序排列后选取排名前三的药品并保存。药品保存的信息包括筛选后的药品排名信息、筛选后的药品名称，也可选择性保存筛选后的目标推荐强度概率值，所述药品排名信息、药品名称与目标推荐强度概率值一一对应。例如，所述第一用户的药品推荐结果分别为第一排名-第三药品-0.97；第二排名-第七药品-0.92；第三排名-第五十药品-0.89。所述第二用户的药品推荐结果分别为第一排名-第一药品-0.89；第二排名-第二药品-0.83；第三排名-第三药口品-0.81。In this embodiment, the specific screening method in step S80 may be: selecting the top three drugs and saving them after sorting the target recommendation intensity probability values in reverse order. The information stored in the drug includes the ranking information of the drug after screening and the name of the drug after screening, and the probability value of the target recommendation intensity after the screening can also be selectively stored. The drug ranking information, drug name and the target recommendation intensity probability value are in one-to-one correspondence. For example, the drug recommendation results of the first user are respectively the first ranking-the third drug-0.97; the second ranking-the seventh drug-0.92; the third ranking-the fiftieth drug-0.89. The drug recommendation results of the second user are respectively the first ranking-the first drug-0.89; the second ranking-the second drug-0.83; the third ranking-the third medicine-0.81.

本申请将XGBOOST推荐模型与深度神经网络推荐模型进行组合，将两者的模型结果进行线性相加，推荐的精确率比非学习模型推荐的精确率提升了200%，且排名前三的正确药品推荐率从75%提升到了95%。本申请对原始画像中的历史医疗数据通过特征扩充提取出更多的隐含信息以确保特征值的完整性，利用饱和度探查筛选删除不具训练意义的特征值，从而形成完整且有效的特征数据集。此外，本申请可应用于智慧医疗中将XGBOOST推荐模型与深度神经网络推荐模型组合而成，自动化程度较高，且可自适应训练和预测，提升了药品推荐的效率。This application combines the XGBOOST recommendation model and the deep neural network recommendation model, and linearly adds the results of the two models. The accuracy of the recommendation is 200% higher than the accuracy of the non-learning model recommendation, and it ranks the top three correct drugs The recommendation rate has increased from 75% to 95%. This application extracts more hidden information from the historical medical data in the original portrait through feature expansion to ensure the integrity of feature values, and uses saturation exploration to filter and delete feature values that are not meaningful for training, thereby forming complete and effective feature data set. In addition, this application can be used in smart medical treatment by combining the XGBOOST recommendation model and the deep neural network recommendation model, with a high degree of automation, and adaptive training and prediction, which improves the efficiency of drug recommendation.

请参阅图3所示，本申请还提供一种基于人工智能的药品推荐系统，其包括：Please refer to Figure 3. This application also provides an artificial intelligence-based drug recommendation system, which includes:

采集模块10，其用于采集历史医疗数据以构建原始画像；The collection module 10 is used to collect historical medical data to construct an original portrait;

预处理模块20，其用于对原始画像进行预处理并输出特征数据集；The preprocessing module 20 is used to preprocess the original image and output a feature data set;

模型训练模块30，其用于将经过预处理的特征数据集作为训练数据，分别预先训练得到XGBOOST推荐模型和深度神经网络推荐模型；The model training module 30 is used for pre-training the pre-processed feature data set as training data to obtain the XGBOOST recommendation model and the deep neural network recommendation model;

输入模块40，其用于获取用户的药品请求信息并输入到XGBOOST推荐模型和深度神经网络推荐模型中；The input module 40 is used to obtain the user's drug request information and input it into the XGBOOST recommendation model and the deep neural network recommendation model;

处理模块50，其用于通过XGBOOST推荐模型计算取得所述药品请求信息对应的目标药品的第一推荐强度概率值，通过深度神经网络推荐模型计算取得所述药品请求信息对应的目标药品的第二推荐强度概率值；并将各所述目标药品的第一推荐强度概率值与第二推荐强度概率值进行线性相加以获得各所述目标药品的目标推荐强度概率值；The processing module 50 is configured to calculate and obtain the first recommended strength probability value of the target drug corresponding to the drug request information through the XGBOOST recommendation model, and obtain the second recommended strength probability value of the target drug corresponding to the drug request information through the deep neural network recommendation model. Recommended intensity probability value; and linearly add the first recommended intensity probability value and the second recommended intensity probability value of each target drug to obtain the target recommended intensity probability value of each target drug;

输出模块60，其用于将各所述目标药品的目标药品信息按照对应的目标推荐强度概率值从高到低排序，筛选出排名前预设数量的目标药品信息，并将所述目标药品信息推送至所述用户。The output module 60 is used to sort the target drug information of each target drug according to the corresponding target recommendation intensity probability value from high to low, filter out the top preset number of target drug information, and compare the target drug information Push to the user.

请参阅图4所示，本申请还提供一种计算机设备2，所述计算机设备2包括：Please refer to FIG. 4, this application also provides a computer device 2, and the computer device 2 includes:

存储器21，用于存储可执行程序代码；以及The memory 21 is used to store executable program codes; and

处理器22，用于调用所述存储器21中的所述可执行程序代码，执行步骤包括上述的基于人工智能的药品推荐方法。The processor 22 is configured to call the executable program code in the memory 21, and the execution steps include the above-mentioned artificial intelligence-based drug recommendation method.

图4中以一个处理器22为例。In FIG. 4, a processor 22 is taken as an example.

存储器21作为一种非易失性计算机可读存储介质，可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块，如本申请实施例中的基于人工智能的药品推荐方法对应的程序指令/模块。处理器22通过运行存储在存储器21中的非易失性软件程序、指令以及模块，从而执行计算机设备2的各种功能应用以及数据处理，即实现上述任意方法实施例中的基于人工智能的药品推荐方法。The memory 21, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as the artificial intelligence-based drug recommendation method in the embodiment of the present application Corresponding program instructions/modules. The processor 22 executes various functional applications and data processing of the computer device 2 by running non-volatile software programs, instructions, and modules stored in the memory 21, that is, realizing the artificial intelligence-based medicine in any of the above-mentioned method embodiments Recommended method.

存储器21可以包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需要的应用程序；存储数据区可存储用户在计算机设备2的历史医疗数据。此外，存储器21可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中，存储器21可选包括相对于处理器22远程设置的存储器21，这些远程存储器21可以通过网络连接至基于人工智能的药品推荐系统1。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 21 may include a storage program area and a storage data area. The storage program area may store an operating system and an application program required by at least one function; the storage data area may store historical medical data of the user in the computer device 2. In addition, the memory 21 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices. In some embodiments, the memory 21 may optionally include a memory 21 remotely provided with respect to the processor 22, and these remote memories 21 may be connected to the artificial intelligence-based medicine recommendation system 1 via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

所述一个或者多个模块存储在所述存储器21中，当被所述一个或者多个处理器22执行时，执行上述任意方法实施例中的基于人工智能的药品推荐方法，例如，执行以上描述的图1-图2的程序。The one or more modules are stored in the memory 21, and when executed by the one or more processors 22, the artificial intelligence-based drug recommendation method in any of the foregoing method embodiments is executed, for example, the foregoing description is executed Figure 1-Figure 2 of the program.

上述产品可执行本申请实施例所提供的方法，具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节，可参见本申请实施例所提供的方法。The above-mentioned products can execute the methods provided in the embodiments of the present application, and have functional modules and beneficial effects corresponding to the execution methods. For technical details not described in detail in this embodiment, please refer to the method provided in the embodiment of this application.

本申请实施例的计算机设备2以多种形式存在，包括但不限于：The computer equipment 2 of the embodiment of the present application exists in various forms, including but not limited to:

（1）移动通信设备：这类设备的特点是具备移动通信功能，并且以提供话音、数据通信为主要目标。这类终端包括：智能手机（例如iPhone）、多媒体手机、功能性手机，以及低端手机等。(1) Mobile communication equipment: This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications. Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.

（2）超移动个人计算机设备：这类设备属于个人计算机的范畴，有计算和处理功能，一般也具备移动上网特性。这类终端包括：PDA、MID和UMPC设备等，例如iPad。(2) Ultra-mobile personal computer equipment: This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features. Such terminals include: PDA, MID, and UMPC devices, such as iPad.

（3）便携式娱乐设备：这类设备可以显示和播放多媒体内容。该类设备包括：音频、视频播放器(例如iPod)，掌上游戏机，电子书，以及智能玩具和便携式车载导航设备。(3) Portable entertainment equipment: This type of equipment can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.

（4）服务器：提供计算服务的设备，服务器的构成包括处理器、硬盘、内存、系统总线等，服务器和通用的计算机架构类似，但是由于需要提供高可靠的服务，因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。(4) Server: A device that provides computing services. The composition of a server includes a processor, hard disk, memory, system bus, etc. The server is similar to a general computer architecture, but due to the need to provide highly reliable services, it is in terms of processing power and stability. , Reliability, security, scalability, and manageability.

（5）其他具有数据交互功能的电子装置。(5) Other electronic devices with data interaction function.

本申请又一实施例还提供了一种非易失性计算机可读存储介质，所述计算机可读存储介质存储有计算机可执行指令，该计算机可执行指令被一个或多个处理器执行，例如图4中的一个处理器22，可使得上述一个或多个处理器22可执行上述任意方法实施例中的基于人工智能的药品推荐方法，例如，执行以上描述的图1-图2的程序。所述计算机可读存储介质可以是非易失性，也可以是易失性。Another embodiment of the present application further provides a non-volatile computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by one or more processors, such as One processor 22 in FIG. 4 can enable the one or more processors 22 to execute the artificial intelligence-based drug recommendation method in any of the foregoing method embodiments, for example, to execute the programs in FIGS. 1 to 2 described above. The computer-readable storage medium may be non-volatile or volatile.

以上所描述的装置实施例仅仅是示意性的，其中作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到至少两个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本申请实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place. , Or it can be distributed to at least two network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application. Those of ordinary skill in the art can understand and implement it without creative work.

通过以上的实施方式的描述，本领域普通技术人员可以清楚地了解到各实施方式可借助软件加通用硬件平台的方式来实现，当然也可以通过硬件。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程是可以通过计算机可读指令来指令相关的硬件来完成，所述的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体（Read-OnlyMemory，ROM）或随机存储记忆体（RandomAccessMemory，RAM）等。Through the description of the above implementation manners, those of ordinary skill in the art can clearly understand that each implementation manner can be implemented by means of software plus a general hardware platform, and of course, it can also be implemented by hardware. Those of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by computer-readable instructions to instruct relevant hardware. The programs can be stored in a computer-readable storage medium. When the program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

所述本申请实施例序号仅仅为了描述，不代表实施例的优劣。The serial numbers of the embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到所述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is more The best way to implement it.

以上仅为本申请的优选实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims

An artificial intelligence-based drug recommendation method, which includes the following steps:

Collect historical medical data to construct original portraits;

Preprocess the original image and output the feature data set;

The pre-processed feature data set is used as the training data, and the XGBOOST recommendation model and the deep neural network recommendation model are pre-trained respectively;

Obtain the user's drug request information and input it into the XGBOOST recommendation model and the deep neural network recommendation model;

The XGBOOST recommendation model calculates and obtains the first recommendation intensity probability value of the target drug corresponding to the drug request information according to the drug request information;

The deep neural network recommendation model calculates and obtains the second recommendation intensity probability value of the target drug corresponding to the drug request information according to the drug request information;

Linearly adding the first recommended intensity probability value and the second recommended intensity probability value of each of the target drugs to obtain the target recommended intensity probability value of each of the target drugs;

The target drug information of each target drug is sorted according to the corresponding target recommendation intensity probability value from high to low, and the preset number of target drug information in the top ranking is screened out, and the target drug information is pushed to the user.

The artificial intelligence-based drug recommendation method according to claim 1, wherein: the original image includes basic information and medical record information, and the data source of the original image includes multiple different data sources, and the multiple different data sources include The multi-dimensional characteristics are inconsistent.

The artificial intelligence-based drug recommendation method according to claim 1, wherein: while collecting historical medical data, the data in the historical medical data and the data in the associated database are established through the association extraction method to establish a table correspondence relationship, and the association is obtained by collecting Link data in the database to construct the original portrait.

The artificial intelligence-based drug recommendation method according to claim 1, wherein: the preprocessing includes feature expansion, saturation exploration and screening, missing value filling, data set heterogeneity and expansion, feedback selection, and correlation screening.

The artificial intelligence-based medicine recommendation method according to claim 1, wherein: the original portrait includes features of an n×m dimensional matrix, and the preprocessing of the original portrait includes:

Using the tsfresh algorithm, the feature of the n×m dimension matrix is simplified into the horizontal splicing of several n×1 dimension vectors, and then particle scanning is performed. The particle scanning includes single-dimensional particle scanning and multi-dimensional particle scanning.

The artificial intelligence-based medicine recommendation method according to claim 5, wherein: the preprocessing of the original image comprises: the single-dimensional particle scan is used to derive a single-dimensional extended feature set L1 through a sliding scan, and the multi-dimensional Particle scanning uses sliding scanning to derive a multi-dimensional extended feature set L2, and then a single-dimensional extended feature set, a multi-dimensional extended feature set and the original image are spliced together to obtain a feature data set containing more information.

The artificial intelligence-based drug recommendation method according to claim 1, wherein: when the historical medical data is collected, the classification of non-standard drug names includes the following steps:

Separate the names of non-standard drugs and standard drugs;

Based on the Word2Vec shallow neural network, and based on the skip_gram algorithm, the characters in the non-standard drug name and the standard drug name are respectively converted into word vectors;

Use the pooling rule to splice the vectors vertically to obtain the word vector splicing of non-standard drug names and standard drug names;

Calculate the vector distance between the non-standard drug name and the standard drug name;

Find the standard drug name with the closest vector distance to the non-standard drug name;

The non-standard drug names are classified according to the classification criteria of the standard drug names with the closest vector distance.

A drug recommendation system based on artificial intelligence, including:

Collection module, which is used to collect historical medical data to construct original portraits;

A preprocessing module, which is used to preprocess the original image and output a feature data set;

Model training module, which is used to pre-train the pre-processed feature data set as training data to obtain the XGBOOST recommendation model and the deep neural network recommendation model;

Input module, which is used to obtain the user's drug request information and input it into the XGBOOST recommendation model and the deep neural network recommendation model;

The processing module is used to calculate and obtain the first recommendation strength probability value of the target drug corresponding to the drug request information through the XGBOOST recommendation model, and obtain the second recommendation of the target drug corresponding to the drug request information through the deep neural network recommendation model Intensity probability value; and linearly adding the first recommended intensity probability value and the second recommended intensity probability value of each of the target drugs to obtain the target recommended intensity probability value of each of the target drugs;

The output module is used to sort the target drug information of each target drug according to the corresponding target recommendation intensity probability value from high to low, filter out the preset number of target drug information in the top ranking, and push the target drug information To the user.

The artificial intelligence-based drug recommendation system according to claim 8, wherein: the original portrait includes basic information and medical record information, and the data source of the original portrait includes multiple different data sources, and the multiple different data sources include The multi-dimensional characteristics are inconsistent.

The artificial intelligence-based drug recommendation system according to claim 8, wherein: the collection module is used to collect historical medical data and at the same time establish a table correspondence between the data in the historical medical data and the data in the associated database through the correlation extraction method Relations, and collect the associated data in the associated database to construct the original portrait.

The artificial intelligence-based drug recommendation system according to claim 8, wherein: the preprocessing includes feature expansion, saturation exploration and screening, missing value filling, data set heterogeneity and expansion, feedback selection, and correlation screening.

The artificial intelligence-based medicine recommendation system according to claim 8, wherein: the original portrait includes features of an n×m dimensional matrix;

The preprocessing module is used to simplify the feature of the n×m dimensional matrix into a horizontal splicing of several n×1 dimensional vectors by using the tsfresh algorithm, and then perform particle scanning. The particle scanning includes single-dimensional particle scanning and multi-dimensional particle scanning.

The artificial intelligence-based drug recommendation system according to claim 12, wherein: the particle scanning performed by the pre-processing module comprises: the single-dimensional particle scanning uses sliding scanning to derive a single-dimensional extended feature set L1, and the multi-dimensional Particle scanning uses sliding scanning to derive a multi-dimensional extended feature set L2, and then a single-dimensional extended feature set, a multi-dimensional extended feature set and the original image are spliced together to obtain a feature data set containing more information.

A computer device comprising a memory, a processor, and computer readable instructions stored in the memory and capable of running on the processor, wherein: the processor executes the computer readable instructions to implement a manual operation The following steps of the smart drug recommendation method:

Collect historical medical data to construct original portraits;

Preprocess the original image and output the feature data set;

The computer device according to claim 14, wherein: the original portrait includes basic information and medical record information, the data source of the original portrait includes multiple different data sources, and the multi-dimensional features included in the multiple different data sources are inconsistent .

The computer device according to claim 14, wherein: said collecting historical medical data to construct an original portrait comprises:

While collecting historical medical data, the data in the historical medical data and the data in the associated database are established by the association extraction method to establish a table correspondence relationship, and the associated data in the associated database is collected to construct the original portrait.

The computer device according to claim 14, wherein: the preprocessing includes feature expansion, saturation exploration and screening, missing value filling, data set heterogeneity and expansion, feedback selection, and correlation screening.

A computer-readable storage medium having computer-readable instructions stored thereon, wherein: when the computer-readable instructions are executed by a processor, the following steps of an artificial intelligence-based drug recommendation method are implemented:

Collect historical medical data to construct original portraits;

Preprocess the original image and output the feature data set;

The computer-readable storage medium according to claim 18, wherein: the original portrait includes basic information and medical record information, the data source of the original portrait includes a plurality of different data sources, and the plurality of different data sources includes a plurality of The dimensional characteristics are inconsistent.

18. The computer-readable storage medium according to claim 18, wherein: the preprocessing includes feature expansion, saturation exploration and screening, missing value filling, data set heterogeneity and expansion, feedback selection, and correlation screening.