CN116758525A - Medicine box real-time identification method and system based on deep learning - Google Patents
Medicine box real-time identification method and system based on deep learning Download PDFInfo
- Publication number
- CN116758525A CN116758525A CN202310718494.6A CN202310718494A CN116758525A CN 116758525 A CN116758525 A CN 116758525A CN 202310718494 A CN202310718494 A CN 202310718494A CN 116758525 A CN116758525 A CN 116758525A
- Authority
- CN
- China
- Prior art keywords
- text
- recognition
- pill box
- pill
- box
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/19007—Matching; Proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Fuzzy Systems (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Automation & Control Theory (AREA)
- Probability & Statistics with Applications (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于深度学习的药盒实时识别方法及系统,属于药品管理技术领域,包括在药盒的上方架设摄像机,采集药盒的外观图像信息;利用预先训练好的基于yolov5的药盒识别模型,获得药盒边缘和基于外观的药盒识别结果;通过预训练好的基于DBNet的文字检测模型进行文字检测,获取药盒文字的区域;利用预先训练好的基于DenseNet和CTC的文字识别模型进行文字识别,识别出药盒的文字信息;将文字识别结果的每个单词或短语依次在药品数据库中进行低阈值的模糊搜索与匹配。本发明可以有效解决药房在取药的时候需要人工核对的问题,而且通过本方法极大的确保了识别的准确率,减轻了药房工作者的工作负担。
The invention discloses a real-time identification method and system for medicine boxes based on deep learning, which belongs to the technical field of medicine management. It includes setting up a camera above the medicine box to collect the appearance image information of the medicine box; using pre-trained medicine boxes based on yolov5 Box recognition model to obtain pill box edge and appearance-based pill box recognition results; perform text detection through the pre-trained DBNet-based text detection model to obtain the area of the pill box text; use pre-trained text based on DenseNet and CTC The recognition model performs text recognition and identifies the text information of the pill box; each word or phrase of the text recognition result is sequentially searched and matched with a low threshold value in the drug database. The invention can effectively solve the problem that pharmacies require manual verification when taking medicine, and this method greatly ensures the accuracy of identification and reduces the workload of pharmacy workers.
Description
技术领域Technical field
本发明涉及药品管理技术领域,具体地说是一种基于深度学习的药盒实时识别方法及系统。The present invention relates to the technical field of drug management, specifically a real-time identification method and system for pill boxes based on deep learning.
背景技术Background technique
近年来,随着经济快速增长,人民群众对医疗卫生水平和服务的需求快速提高,各种自动化、智能化的医疗设备得到快速发展。在医疗设备条件和医生素质大幅提高的同时,药品的种类和数量也随之迅速增长。而传统的药房服务模式,主要依靠人力来实现药品的存取。在该模式下,配置和分发药品的任务均由专业的药剂师完成。患者通过窗口递进处方取药,而药师则忙于配取药品,处于被动的发药地位。这种模式主要存在两方面弊端:首先对于药剂师来讲,由于药品种类繁多,调配药品需要严格的确认,药剂师工作量巨大而技术内涵得不到提升,淡化了药剂师的工作技术含量,浪费宝贵的人才资源和医疗资源;同时,长时间的重复性工作也存在取错药品的安全隐患;另一方面,这种服务模式有碍医患之间的交流,由于药剂师忙于配取药品,无法提供患者更多的用药咨询服务;而且,患者会因为人工取药效率低而在取药时必须排队等候,影响患者的就医体验。因此,实现药房自动化与信息化是门诊药房的改革趋势,也是进一步提高医疗质量的保障。In recent years, with the rapid economic growth, the people's demand for medical and health care standards and services has increased rapidly, and various automated and intelligent medical equipment have developed rapidly. While the conditions of medical equipment and the quality of doctors have been greatly improved, the types and quantities of medicines have also grown rapidly. The traditional pharmacy service model mainly relies on manpower to deposit and withdraw medicines. In this model, the tasks of preparing and dispensing medicines are completed by professional pharmacists. Patients receive prescriptions through the window, while pharmacists are busy dispensing medicines and are in a passive position of dispensing medicines. This model mainly has two disadvantages: First, for pharmacists, due to the wide variety of drugs, strict confirmation is required for dispensing drugs. The workload of pharmacists is huge and the technical content cannot be improved, which dilutes the technical content of pharmacists' work. It wastes precious human resources and medical resources; at the same time, long-term repetitive work also poses safety risks of getting the wrong medicines; on the other hand, this service model hinders the communication between doctors and patients, because pharmacists are busy dispensing medicines. , unable to provide patients with more medication consultation services; moreover, patients will have to wait in line when taking medicine due to the low efficiency of manual medicine taking, which affects the patient's medical experience. Therefore, realizing pharmacy automation and informatization is a reform trend for outpatient pharmacies and is also a guarantee for further improving medical quality.
随着计算机技术的发展,深度学习技术也得到了广泛应用。利用图像识别技术可以准确地检测和识别药盒,从而能够实现药品的自动化管理。然而,虽然目前的识别技术已经得到发展,但由于药盒在形状、颜色上的差异性不大且种类繁多,导致当前的药盒图像识别算法性能较差,精度远远不够实际的药房使用。所以提高药盒识别算法的性能是整个智慧药房核对过程的关键,识别效果的好坏会直接影响到患者的用药安全。With the development of computer technology, deep learning technology has also been widely used. Image recognition technology can be used to accurately detect and identify pill boxes, thus enabling automated management of medicines. However, although the current recognition technology has been developed, due to the small difference in shape and color of the pill boxes and the wide variety, the current pill box image recognition algorithm has poor performance and is far from accurate enough for actual pharmacy use. Therefore, improving the performance of the pill box recognition algorithm is the key to the entire smart pharmacy verification process. The quality of the recognition effect will directly affect the patient's medication safety.
发明内容Contents of the invention
本发明的技术任务是针对以上不足之处,提供一种基于深度学习的药盒实时识别方法及系统,能精准识别药品的种类和数量,提高药房放药速度,降低人工成本。The technical task of the present invention is to address the above shortcomings and provide a real-time identification method and system for medicine boxes based on deep learning, which can accurately identify the type and quantity of medicines, increase the speed of drug dispensing in the pharmacy, and reduce labor costs.
本发明解决其技术问题所采用的技术方案是:The technical solutions adopted by the present invention to solve the technical problems are:
一种基于深度学习的药盒实时识别方法,包括以下步骤:A method for real-time identification of pill boxes based on deep learning, including the following steps:
在药盒的上方架设摄像机,采集药盒的外观图像信息;Set up a camera above the pill box to collect appearance image information of the pill box;
利用预先训练好的基于yolov5的药盒识别模型,检测和识别药盒,获得药盒边缘和基于外观的药盒识别结果;Use the pre-trained pill box recognition model based on yolov5 to detect and identify pill boxes, and obtain pill box edge and appearance-based pill box recognition results;
按照获取的药盒边缘对原图裁剪,获取每张图片只含单一药盒的多张图片;对每张已剪裁的药盒图片,通过预训练好的基于DBNet的文字检测模型进行文字检测,获取药盒文字的区域;The original image is cropped according to the edge of the obtained pill box, and multiple images containing only a single pill box are obtained. For each cropped pill box image, text detection is performed through a pre-trained DBNet-based text detection model. Get the area of the pill box text;
对于识别到的文字区域,利用预先训练好的基于DenseNet和 CTC的文字识别模型进行文字识别,识别出药盒的文字信息;For the recognized text area, use the pre-trained text recognition model based on DenseNet and CTC to perform text recognition and identify the text information of the pill box;
将文字识别结果的每个单词或短语依次在药品数据库中进行低阈值的模糊搜索与匹配,模糊搜索与匹配的内容为药品的药名、品牌、厂家、剂量属性,得到一种或者多种药品编号;Each word or phrase of the text recognition result is sequentially subjected to low-threshold fuzzy search and matching in the drug database. The content of fuzzy search and matching is the drug name, brand, manufacturer, and dosage attributes of the drug, and one or more drugs are obtained. serial number;
将基于外观识别的药品唯一编号在基于文字识别所查询到的药品编号中查找;若查找成功,输出识别结果;Search the unique drug number based on appearance recognition among the drug numbers queried based on text recognition; if the search is successful, output the recognition result;
若查找失败则将文字识别结果的每个单词或短语依次在药品数据库中进行智能模糊搜索与匹配,最终获得药盒识别结果。If the search fails, each word or phrase of the text recognition result will be sequentially searched and matched intelligently in the drug database, and finally the pill box recognition result will be obtained.
进一步地,在药盒的上方架设摄像机,采集药盒的外观图像信息的步骤之前包括:Further, the step of setting up a camera above the pill box to collect appearance image information of the pill box includes:
在利用神经网络模型进行药盒外观识别和文字识别之前需要对所有模型进行训练,包括基于yolov5的药盒外观图像识别模型、基于DBnet的文字检测模型、基于DenseNet+CTC的文字识别模型。Before using the neural network model for pill box appearance recognition and text recognition, all models need to be trained, including the pill box appearance image recognition model based on yolov5, the text detection model based on DBnet, and the text recognition model based on DenseNet+CTC.
在药盒的上方架设摄像机,采集药盒的外观图像信息步骤包括:Set up a camera above the pill box to collect the appearance image information of the pill box. The steps include:
将药盒放置在药盒信息采集箱内,上方架设摄像头,拍摄药盒的俯视图片,所述药盒信息采集箱的内壁采用不反光的黑色材料,并提供稳定光源;The medicine box is placed in the medicine box information collection box, and a camera is set up above to take a top-down picture of the medicine box. The inner wall of the medicine box information collection box is made of non-reflective black material and provides a stable light source;
用预先训练好的基于yolov5的药盒识别模型,检测和识别药盒,获得药盒边缘和基于外观的药盒的药品唯一编号;Use the pre-trained pill box recognition model based on yolov5 to detect and identify the pill box, and obtain the edge of the pill box and the unique number of the medicine based on the appearance of the pill box;
按照获取的药盒边缘对原图裁剪,获取每张图片只含单一药盒的多张图片;Crop the original image according to the edge of the obtained pill box, and obtain multiple pictures in which each picture only contains a single pill box;
对每张已剪裁的药盒图片,通过预训练好的基于DBNet的文字检测模型进行文字检测,获取药盒文字的区域;For each cropped pill box picture, text detection is performed through the pre-trained DBNet-based text detection model to obtain the area of the pill box text;
对于识别到的文字区域,利用预先训练好的基于DenseNet和 CTC的文字识别模型进行文字识别,识别出药盒的文字信息。For the recognized text area, the pre-trained text recognition model based on DenseNet and CTC is used for text recognition to identify the text information of the pill box.
进一步地,基于yolov5的药盒外观图像识别模型的训练包括以下步骤:Further, the training of the pill box appearance image recognition model based on yolov5 includes the following steps:
将药盒放置在药盒信息采集箱内,上方架设一个可活动的摄像机,所述药盒信息采集箱的内壁采用不反光的黑色材料,并提供稳定光源;The pill box is placed in the pill box information collection box, and a movable camera is set up above. The inner wall of the pill box information collection box is made of non-reflective black material and provides a stable light source;
摄像机按照俯视的角度对所有要收集的药品进行拍照,获得药盒充足的外观图片,为保证训练质量,每个药盒采集的有效图片不应低于20张;The camera takes pictures of all the medicines to be collected from a bird's-eye view to obtain sufficient appearance pictures of the medicine boxes. To ensure the quality of training, the effective pictures collected for each medicine box should not be less than 20;
通过标签标注工具,将获取的药盒图片进行区域和类别的标注;Use the labeling tool to label the obtained pill box images by region and category;
加载yolov5s预训练模型;Load yolov5s pre-trained model;
在训练过程中通过mosaic数据增强等方式对药盒的外观数据进行数据增强,在经过适当的训练轮次后,直至基于yolov5的神经网络模型达到收敛。During the training process, the appearance data of the pill box is data enhanced through mosaic data enhancement and other methods. After appropriate training rounds, the neural network model based on yolov5 reaches convergence.
进一步地,训练基于DBnet的文字检测模型、基于DenseNet+CTC的文字识别模型的步骤包括:Further, the steps for training the text detection model based on DBnet and the text recognition model based on DenseNet+CTC include:
使用公共数据集,该数据集包含共约364万张图片,按照99:1划分成训练集和验证集,数据利用中文语料库(新闻+文言文),通过字体、大小、灰度、模糊、透视、拉伸等变化随机生成,包含汉字、英文字母、数字和标点共5990个字符,每个样本固定10个字符,字符随机截取自语料库中的句子,图片分辨率统一为280×3;Using a public data set, the data set contains a total of about 3.64 million images, divided into a training set and a verification set according to 99:1. The data uses the Chinese corpus (news + classical Chinese), through font, size, grayscale, blur, perspective, Stretching and other changes are randomly generated, including Chinese characters, English letters, numbers and punctuation, a total of 5990 characters. Each sample has a fixed 10 characters. The characters are randomly intercepted from sentences in the corpus, and the image resolution is unified to 280×3;
利用文字合成程序合成常用药品的文本图片以增加训练数据量;Use text synthesis programs to synthesize text images of commonly used drugs to increase the amount of training data;
加载在其他大型文本识别数据集上训练的预训练模型;Load pre-trained models trained on other large text recognition datasets;
利用数据集对DBnet的文字检测模型、基于DenseNet+CTC的文字识别模型进行训练,直至模型收敛。Use the data set to train the text detection model of DBnet and the text recognition model based on DenseNet+CTC until the model converges.
进一步地,利用预先训练好的基于yolov5的药盒识别模型,检测和识别药盒,获得药盒边缘和基于外观的药盒识别结果中的yolov5神经网络模型采用yolov5s进行模型构建,由输入端(Input)、主干网络(Backbone)、颈部(Neck)和头部(Head)和输出端(Output)五部分组成;Further, the pre-trained yolov5-based pill box recognition model is used to detect and identify pill boxes, and the yolov5 neural network model in the pill box edge and appearance-based pill box identification results is constructed using yolov5s, which is composed of the input end ( It consists of five parts: Input, Backbone, Neck, Head and Output;
输入端输入的图像尺寸为640×640×3,采用Mosaic数据增强、Anchor自适应锚框计算和图像缩放等策略对图像进行预处理;The image size input at the input end is 640×640×3, and strategies such as Mosaic data enhancement, Anchor adaptive anchor frame calculation and image scaling are used to preprocess the image;
在yolov5中使用CSPDarknet53作为模型的主干网络,包括Focus模块、Conv模块、C3模块和SPP模块,其作用是从输入图像中提取丰富的语义特征;颈部采用FPN和PAN生成特征金字塔,用来增强对多尺度目标的检测;头部是对从颈部传递来的特征进行预测,并生成3个不同尺度的特征图;In yolov5, CSPDarknet53 is used as the backbone network of the model, including Focus module, Conv module, C3 module and SPP module. Its function is to extract rich semantic features from the input image; FPN and PAN are used in the neck to generate feature pyramids for enhancement. Detection of multi-scale targets; the head predicts the features passed from the neck and generates feature maps of three different scales;
Conv模块的结构为Conv2d+BN+SiLU,依次是卷积层、归一化操作和激活函数;Focus模块的目的是减少模型的计算量,加快网络的训练速度;首先将输入大小为3×640×640的图像切分成4个切片,其中每个切片的大小为3×320×320;然后使用拼接操作将4个切片通过通道维度拼接起来,得到的特征图尺度为12×320×320;再经过一次卷积操作,最终得32×320×320的特征图;The structure of the Conv module is Conv2d+BN+SiLU, followed by the convolution layer, normalization operation and activation function; the purpose of the Focus module is to reduce the calculation amount of the model and speed up the training speed of the network; first, the input size is 3×640 The image of After a convolution operation, the final feature map is 32×320×320;
C3模块由两个分支组成,在第一条分支中输入的特征图要通过3个连续的Conv模块和多个堆叠的Bottleneck模块;在第二条分支中,特征图仅通过一个Conv模块,最终将两个分支按通道拼接在一起;其中,Bottleneck模块可更好地提取目标的高级特征,主要由两个连续的卷积操作和一个残差操作组成;The C3 module consists of two branches. In the first branch, the input feature map passes through 3 consecutive Conv modules and multiple stacked Bottleneck modules; in the second branch, the feature map only passes through one Conv module, and finally The two branches are spliced together by channel; among them, the Bottleneck module can better extract high-level features of the target, and mainly consists of two consecutive convolution operations and a residual operation;
SPP模块是空间金字塔池化模块,用来扩大网络的感受野;在yolov5s中SPP模块的输入特征图大小为512×20×20,通过一个Conv模块后通道数减半;然后对特征图使用卷积核分别为5×5、9×9、13×13的最大池化操作,并将3种特征图与输入特征图按通道拼接后再通过一个Conv模块,最终输出的特征图大小为512×20×20。The SPP module is a spatial pyramid pooling module, used to expand the receptive field of the network; in yolov5s, the input feature map size of the SPP module is 512×20×20, and the number of channels is halved after passing a Conv module; then the volume is used on the feature map The accumulation kernels are maximum pooling operations of 5×5, 9×9, and 13×13 respectively. The three feature maps and the input feature map are spliced by channel and then passed through a Conv module. The final output feature map size is 512× 20×20.
进一步地,通过预训练好的基于DBNet的文字检测模型进行文字检测中的DBNet的文字检测模型由DBNet文本检测网络组成,DBNet主要使用可微分二值化作为基于简单分割网络的文本检测器。可微分二值化公式如式1所示:Furthermore, the text detection model of DBNet in text detection is performed through the pre-trained DBNet-based text detection model, which consists of the DBNet text detection network. DBNet mainly uses differentiable binarization as a text detector based on a simple segmentation network. The differentiable binarization formula is shown in Equation 1:
(1) (1)
其中B表示近似的二值图,T是网络学习的阈值特征图,k表示放大倍数;这里取经验值50,这样就可以更好地区分前景与背景。Among them, B represents the approximate binary image, T is the threshold feature map of network learning, and k represents the magnification factor; here, the empirical value of 50 is taken, so that the foreground and background can be better distinguished.
进一步地,利用预先训练好的基于DenseNet和 CTC的文字识别模型进行文字识别的DenseNet和 CTC的文字识别模型由DenseNet神经网络由CNN层、RNN层以及CTC转录层组成;Furthermore, the text recognition model of DenseNet and CTC that uses pre-trained text recognition models based on DenseNet and CTC for text recognition consists of a DenseNet neural network composed of a CNN layer, an RNN layer, and a CTC transcription layer;
文字识别模型各层从下到上,功能如下:卷积层(CNN)来提取输入图像的特征;循环层(RNN)来预测CNN层输入特征序列的分布;转录层(CTC)来过滤序列标签的冗余数据;The functions of each layer of the text recognition model from bottom to top are as follows: the convolutional layer (CNN) to extract the features of the input image; the recurrent layer (RNN) to predict the distribution of the input feature sequence of the CNN layer; the transcription layer (CTC) to filter the sequence tags redundant data;
DenseNet网络选用Relu作为激活函数,使用了3个Dense Block 层进行演算,各个Dense Block之间通过Transition结构连接在一起组成的DenseNet网络,配合CTC_loss进行训练并得出最终的数据模型;The DenseNet network uses Relu as the activation function and uses 3 Dense Block layers for calculation. The DenseNet network composed of each Dense Block connected together through the Transition structure is trained with CTC_loss to obtain the final data model;
DenseNet网络使用跳跃拼接保留原本的特征,降低了梯度消失现象的发生,然而由于网络深度不断加深,导致通道数与参数量增多,使得模型很难提取深层次的特征,因此DenseNet设置了Transition转换模块;该模块用于Dense Block之后,主要用来减少通道数;同时为了使通道数更少,在每次DenseBlock拼接前都添加瓶颈结构,使通道数更少;通过池化减少参数后传送给下层的Dense Block结构,从而达到较高的精度。The DenseNet network uses skip splicing to retain the original features and reduce the occurrence of gradient disappearance. However, as the network depth continues to deepen, the number of channels and parameters increase, making it difficult for the model to extract deep features, so DenseNet sets up a Transition conversion module ;After this module is used in Dense Block, it is mainly used to reduce the number of channels; at the same time, in order to reduce the number of channels, a bottleneck structure is added before each DenseBlock splicing to reduce the number of channels; parameters are reduced through pooling and then transmitted to the lower layer. Dense Block structure to achieve higher accuracy.
进一步地,将文字识别结果的每个单词或短语依次在药品数据库中进行低阈值的模糊搜索与匹配,模糊搜索与匹配的内容为药品的名称、厂家、剂量等属性,得到一种或者多种药品编号的步骤为:Further, each word or phrase of the text recognition result is sequentially subjected to low-threshold fuzzy search and matching in the drug database. The content of fuzzy search and matching is the name, manufacturer, dosage and other attributes of the drug, and one or more The steps for drug numbering are:
考虑药盒文字识别可能出现识别的文字不全的情况,模糊匹配的匹配阈值较低,所述低阈值为0.5,即包含一半相同字符则视为匹配成功;文字识别可以获得多个单词或语句,在药品数据库中进行低阈值的模糊搜索与匹配,模糊搜索与匹配的内容首先为药品的名称,若药品名称匹配通过,则在已匹配的药品中继续匹配厂家、剂量等属性,若其中一项属性没有匹配到任何一种药品,视为匹配失败;匹配失败不会缩小匹配范围,会直接跳过该项属性的匹配;最后通过模糊搜索与匹配可以得到一种或者多种药品编号。Considering that pill box text recognition may not recognize all the text, the matching threshold of fuzzy matching is low. The low threshold is 0.5, that is, if it contains half of the same characters, it is considered a successful match; text recognition can obtain multiple words or sentences. Perform low-threshold fuzzy search and matching in the drug database. The content of fuzzy search and matching is first the name of the drug. If the drug name is matched, the manufacturer, dosage and other attributes of the matched drugs will continue to be matched. If one of the If the attribute does not match any drug, it is considered a match failure; a match failure will not narrow the matching scope, and will directly skip the matching of the attribute; finally, one or more drug numbers can be obtained through fuzzy search and matching.
进一步地,将基于外观识别的药品唯一编号在基于文字识别所查询到的药品编号中查找,若查找成功,则直接输出药盒识别结果;若查找失败则将文字识别结果的每个单词或短语依次在药品数据库中进行智能匹配,最终获得药盒识别结果:Further, the unique drug number based on appearance recognition is searched among the drug numbers queried based on text recognition. If the search is successful, the pill box recognition result is directly output; if the search fails, each word or phrase of the text recognition result is output. Intelligent matching is performed in the drug database in turn, and the pill box identification result is finally obtained:
考虑到因基于外观识别的结果错误,而在文字识别获得的结果中查找不到任何药品的情况;此时,完全依靠文字识别确认药盒的信息;首先将文字识别结果的每个单词或短语依次在药品数据库中利用智能匹配算法匹配药品的名称属性;获得匹配的药品后,在这些药品下,依次利用智能匹配算法匹配品牌、厂家、剂量等属性,最终获得一种药品编号;Considering that no medicine can be found in the results obtained by text recognition due to errors in appearance-based recognition results; at this time, text recognition is completely relied on to confirm the information of the pill box; first, each word or phrase of the text recognition result is The intelligent matching algorithm is used to match the name attributes of the drugs in the drug database in turn; after obtaining the matching drugs, the intelligent matching algorithm is used to match the brand, manufacturer, dosage and other attributes of these drugs, and finally a drug number is obtained;
所述智能匹配算法如下:The intelligent matching algorithm is as follows:
首先匹配的是药品名称属性,初始匹配阈值为1,即完全相同则视为匹配成功。若出现任何一种药品都没有匹配到情况,则视为匹配失败。若匹配失败,则下降匹配阈值0.1,直至匹配成功;首次阈值不为1的匹配成功后,依次提高0.01的匹配阈值,直至匹配失败;匹配失败后,将匹配失败前一次匹配成功的结果视为最终匹配结果。The first thing to match is the drug name attribute. The initial matching threshold is 1, that is, if they are identical, the match is considered successful. If there is no match for any drug, it will be deemed that the matching failed. If the match fails, the matching threshold is lowered by 0.1 until the match is successful; after the first successful match with a threshold value other than 1, the matching threshold is increased by 0.01 until the match fails; after the match fails, the result of the successful match before the failed match is regarded as final matching result.
本发明还要求保护一种基于深度学习的药盒实时识别系统,包括图像采集装置和数据处理装置,The present invention also claims a real-time identification system for pill boxes based on deep learning, including an image acquisition device and a data processing device,
所述图像采集装置用于采集药盒的外观图像信息;The image acquisition device is used to collect appearance image information of the medicine box;
所述数据处理装置用于实现基于yolov5和文字识别对采集的药盒外观图像信息进行实时识别,数据处理装置包括基于yolov5的药盒识别模块、基于DBNet的文字检测模块、基于DenseNet和 CTC的文字识别模块及基于文字识别和外观识别的药盒识别模块;The data processing device is used to realize real-time recognition of the collected pill box appearance image information based on yolov5 and text recognition. The data processing device includes a pill box recognition module based on yolov5, a text detection module based on DBNet, and text based on DenseNet and CTC. Recognition module and pill box recognition module based on text recognition and appearance recognition;
该系统能够实现上述的基于深度学习的药盒实时识别方法。This system can implement the above-mentioned real-time identification method of pill boxes based on deep learning.
本发明所取得的有益效果为:The beneficial effects achieved by the present invention are:
本发明公开了一种基于深度学习的药盒实时识别方法。通过摄像头获取药盒的外观图片;利用预先训练好的基于yolov5的药盒识别模型,检测和识别药盒,获得药盒边缘和基于外观的药盒识别结果;按照获取的药盒边缘对原图裁剪,获取每张图片只含单一药盒的多张图片;对每张已剪裁的药盒图片,通过预训练好的基于DBNet的文字检测模型进行文字检测,获取药盒文字的区域;对于识别到的文字区域,利用预先训练好的基于DenseNet和CTC的文字识别模型进行文字识别,识别出药盒的文字信息;将文字识别结果的每个单词或短语依次在药品数据库中进行低阈值的模糊搜索与匹配,模糊搜索与匹配的内容为药品的药名、品牌、厂家、剂量属性,得到一种或者多种药品编号;将基于外观识别的药品唯一编号在基于文字识别所查询到的药品编号中查找;若查找成功,输出识别结果;若查找失败则将文字识别结果的每个单词或短语依次在药品数据库中进行智能模糊搜索与匹配,最终获得药盒识别结果。本发明综合应用图像识别和文字识别技术,极大的保证了识别的精度,确保患者的用药安全。本发明可以有效解决药房在取药的时候需要人工核对的问题,而且通过本方法极大的确保了识别的准确率,减轻了药房工作者的工作负担,同时保障了患者的用药安全。The invention discloses a real-time identification method of medicine boxes based on deep learning. Obtain the appearance picture of the pill box through the camera; use the pre-trained pill box recognition model based on yolov5 to detect and identify the pill box, and obtain the pill box edge and appearance-based pill box recognition results; compare the original image according to the obtained pill box edge Crop to obtain multiple pictures each containing only a single pill box; for each clipped pill box picture, perform text detection through a pre-trained DBNet-based text detection model to obtain the area of the pill box text; for recognition In the text area found, the pre-trained text recognition model based on DenseNet and CTC is used for text recognition to identify the text information of the pill box; each word or phrase of the text recognition result is sequentially subjected to low-threshold fuzzy in the drug database. Search and match. The content of fuzzy search and matching is the drug name, brand, manufacturer, and dosage attributes of the drug, and one or more drug numbers are obtained; the unique number of the drug based on appearance recognition is added to the drug number queried based on text recognition. Search in; if the search is successful, the recognition result is output; if the search fails, each word or phrase of the text recognition result is sequentially searched and matched in the drug database through intelligent fuzzy search, and finally the pill box recognition result is obtained. The invention comprehensively applies image recognition and text recognition technology to greatly ensure the accuracy of recognition and ensure the safety of medication for patients. The present invention can effectively solve the problem that pharmacies require manual verification when taking medicine, and this method greatly ensures the accuracy of identification, reduces the workload of pharmacy workers, and at the same time ensures the safety of patients' medication.
附图说明Description of the drawings
图1是本发明实施例提供的一种基于深度学习的药盒实时识别方法流程示意图。Figure 1 is a schematic flowchart of a method for real-time identification of pill boxes based on deep learning provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图和具体实施例对本发明作进一步说明,以使本领域的技术人员可以更好地理解本发明并能予以实施,但所举实施例不作为对本发明的限定,在不冲突的情况下,本发明实施例以及实施例中的技术特征可以相互结合。The present invention will be further described below in conjunction with the accompanying drawings and specific examples, so that those skilled in the art can better understand the present invention and implement it. However, the illustrated embodiments are not intended to limit the present invention. In the absence of conflict, Below, the embodiments of the present invention and the technical features in the embodiments can be combined with each other.
如图1所示,本发明实例提出一种基于深度学习的药品实时识别方法,包括以下步骤:As shown in Figure 1, the example of the present invention proposes a real-time drug identification method based on deep learning, which includes the following steps:
S100、在药盒的上方架设摄像机,采集药盒的外观图像信息;S100. Set up a camera above the pill box to collect appearance image information of the pill box;
将药盒放置在药盒信息采集箱内,上方架设摄像机,为降低平台反光对识别结果的干扰,所述药盒信息采集箱的内壁采用不反光的黑色材料,并在内部提供稳定工业光源。The pill box is placed in the pill box information collection box, and a camera is set up above. In order to reduce the interference of the platform's reflection on the recognition results, the inner wall of the pill box information collection box is made of non-reflective black material, and a stable industrial light source is provided inside.
S200、用预先训练好的基于yolov5的药盒识别模型,检测和识别药盒,获得药盒边缘和基于外观的药盒的药品唯一编号。S200: Use the pre-trained pill box recognition model based on yolov5 to detect and identify the pill box, and obtain the edge of the pill box and the unique number of the medicine based on the appearance of the pill box.
yolov5神经网络模型采用yolov5s进行模型构建,由输入端(Input)、主干网络(Backbone)、颈部(Neck)和头部(Head)和输出端(Output)五部分组成:The yolov5 neural network model is constructed using yolov5s and consists of five parts: input, backbone, neck, head and output:
输入端输入的图像尺寸为640×640×3,采用Mosaic数据增强、Anchor自适应锚框计算和图像缩放等策略对图像进行预处理;The image size input at the input end is 640×640×3, and strategies such as Mosaic data enhancement, Anchor adaptive anchor frame calculation and image scaling are used to preprocess the image;
在yolov5中使用CSPDarknet53作为模型的主干网络,包括Focus模块、Conv模块、C3模块和SPP模块,颈部采用FPN和PAN生成特征金字塔;CSPDarknet53 is used as the backbone network of the model in yolov5, including Focus module, Conv module, C3 module and SPP module, and FPN and PAN are used in the neck to generate feature pyramid;
Conv模块的结构为Conv2d+BN+SiLU,依次是卷积层、归一化操作和激活函数;Focus模块首先将输入大小为3×640×640的图像切分成4个切片,其中每个切片的大小为3×320×320;然后使用拼接操作将4个切片通过通道维度拼接起来,得到的特征图尺度为12×320×320;再经过一次卷积操作,最终得32×320×320的特征图;The structure of the Conv module is Conv2d+BN+SiLU, followed by the convolution layer, normalization operation and activation function; the Focus module first divides the image with an input size of 3×640×640 into 4 slices, where each slice The size is 3×320×320; then use the splicing operation to splice the 4 slices through the channel dimension, and the resulting feature map scale is 12×320×320; after another convolution operation, the final feature is 32×320×320 picture;
C3模块由两个分支组成,在第一条分支中输入的特征图要通过3个连续的Conv模块和多个堆叠的Bottleneck模块;在第二条分支中,特征图仅通过一个Conv模块,最终将两个分支按通道拼接在一起;其中,Bottleneck模块主要由两个连续的卷积操作和一个残差操作组成;The C3 module consists of two branches. In the first branch, the input feature map passes through 3 consecutive Conv modules and multiple stacked Bottleneck modules; in the second branch, the feature map only passes through one Conv module, and finally The two branches are spliced together by channel; among them, the Bottleneck module mainly consists of two consecutive convolution operations and a residual operation;
SPP模块是空间金字塔池化模块;SPP模块的输入特征图大小为512×20×20,通过一个Conv模块后通道数减半;然后对特征图使用卷积核分别为5×5、9×9、13×13的最大池化操作,并将3种特征图与输入特征图按通道拼接后再通过一个Conv模块,最终输出的特征图大小为512×20×20。The SPP module is a spatial pyramid pooling module; the input feature map size of the SPP module is 512×20×20, and the number of channels is halved after passing through a Conv module; then the convolution kernels used for the feature maps are 5×5 and 9×9 respectively. , 13×13 maximum pooling operation, and the three feature maps and the input feature map are spliced by channel and then passed through a Conv module. The final output feature map size is 512×20×20.
训练基于yolov5的药盒外观图像识别模型的步骤如下:The steps to train the pill box appearance image recognition model based on yolov5 are as follows:
将药盒放置在药盒信息采集箱内,上方架设一个可活动的摄像机,所述药盒信息采集箱的内壁采用不反光的黑色材料,并提供稳定光源;The pill box is placed in the pill box information collection box, and a movable camera is set up above. The inner wall of the pill box information collection box is made of non-reflective black material and provides a stable light source;
基于成像质量的考虑,相机选择产品性能较好的相机。摄像机按照俯视的角度对所有要收集的药品进行拍照,获得药盒充足的外观图片,为保证训练质量,每个药盒采集的有效图片不应低于20张;为保证药盒识别的有效性,采集的药盒图片的角度应该满足步骤100所采集照片时出现的角度。Based on the consideration of imaging quality, the camera chooses a camera with better product performance. The camera takes pictures of all the drugs to be collected from a bird's-eye view to obtain sufficient appearance pictures of the pill boxes. To ensure the quality of training, the effective pictures collected for each pill box should not be less than 20; to ensure the effectiveness of pill box identification , the angle of the collected pill box picture should meet the angle that appeared when the photo was collected in step 100.
通过标签标注工具,将获取的药盒图片进行区域和类别的标注;标注时应该进行紧贴药盒边缘。Use the labeling tool to label the regions and categories of the obtained pill box pictures; the labeling should be close to the edge of the pill box.
加载yolov5s预训练模型;Load yolov5s pre-trained model;
在训练过程中通过mosaic数据增强等方式对药盒的外观数据进行数据增强,在经过适当的训练轮次后,直至基于yolov5的神经网络模型达到收敛;During the training process, the appearance data of the pill box is data enhanced through mosaic data enhancement and other methods. After appropriate training rounds, until the neural network model based on yolov5 reaches convergence;
完成模型训练后,在本步骤用训练好的基于yolov5的药盒识别模型对采集的药盒照片进行检测和识别,获得药盒边缘和基于外观的药盒的药品唯一编号。After completing the model training, in this step, use the trained yolov5-based pill box recognition model to detect and identify the collected pill box photos, and obtain the edge of the pill box and the unique drug number of the pill box based on appearance.
S300、按照获取的药盒边缘对原图裁剪,获取每张图片只含单一药盒的多张图片。S300. Crop the original image according to the edge of the obtained pill box, and obtain multiple pictures in which each picture only contains a single pill box.
按照yolov5检测到的药盒边缘,对原图片进行剪裁,获得多张药盒图片,且每张图片仅含单一药盒。According to the edge of the pill box detected by yolov5, the original picture is cropped to obtain multiple pill box pictures, and each picture only contains a single pill box.
S400、对每张已剪裁的药盒图片,通过预训练好的基于DBNet的文字检测模型进行文字检测,获取药盒文字的区域。S400. For each cropped pill box picture, perform text detection through the pre-trained DBNet-based text detection model to obtain the area of the pill box text.
DBNet的文字检测模型由DBNet文本检测网络组成,该网络主要使用可微分二值化作为基于简单分割网络的文本检测器,可微分二值化公式如式1所下:DBNet's text detection model consists of DBNet text detection network, which mainly uses differentiable binarization as a text detector based on a simple segmentation network. The differentiable binarization formula is as follows:
(1) (1)
其中B表示近似的二值图,T是网络学习的阈值特征图,k表示放大倍数。Among them, B represents the approximate binary image, T is the threshold feature map learned by the network, and k represents the magnification factor.
S500、对于识别到的文字区域,利用预先训练好的基于DenseNet和 CTC的文字识别模型进行文字识别,识别出药盒的文字信息。S500. For the recognized text area, use the pre-trained text recognition model based on DenseNet and CTC to perform text recognition and identify the text information of the pill box.
DenseNet和 CTC的文字识别模型由DenseNet神经网络由CNN层、RNN层以及CTC转录层组成;The text recognition models of DenseNet and CTC are composed of DenseNet neural network, which consists of CNN layer, RNN layer and CTC transcription layer;
文字识别模型各层从下到上,功能如下:卷积层(CNN)来提取输入图像的特征;循环层(RNN)来预测CNN层输入特征序列的分布;转录层(CTC)来过滤序列标签的冗余数据;The functions of each layer of the text recognition model from bottom to top are as follows: the convolutional layer (CNN) to extract the features of the input image; the recurrent layer (RNN) to predict the distribution of the input feature sequence of the CNN layer; the transcription layer (CTC) to filter the sequence tags redundant data;
DenseNet网络选用Relu作为激活函数,使用了3个Dense Block 层进行演算,各个Dense Block之间通过Transition结构连接在一起组成的DenseNet网络,配合CTC_loss进行训练并得出最终的数据模型;The DenseNet network uses Relu as the activation function and uses 3 Dense Block layers for calculation. The DenseNet network composed of each Dense Block connected together through the Transition structure is trained with CTC_loss to obtain the final data model;
DenseNet网络使用跳跃拼接保留原本的特征,降低了梯度消失现象的发生,然而由于网络深度不断加深,导致通道数与参数量增多,使得模型很难提取深层次的特征,因此DenseNet设置了Transition转换模块;该模块用于Dense Block之后,主要用来减少通道数;同时为了使通道数更少,在每次DenseBlock拼接前都添加瓶颈结构,使通道数更少;通过池化减少参数后传送给下层的Dense Block结构,从而达到较高的精度。The DenseNet network uses skip splicing to retain the original features and reduce the occurrence of gradient disappearance. However, as the network depth continues to deepen, the number of channels and parameters increase, making it difficult for the model to extract deep features, so DenseNet sets up a Transition conversion module ;After this module is used in Dense Block, it is mainly used to reduce the number of channels; at the same time, in order to reduce the number of channels, a bottleneck structure is added before each DenseBlock splicing to reduce the number of channels; parameters are reduced through pooling and then transmitted to the lower layer. Dense Block structure to achieve higher accuracy.
训练基于DBnet的文字检测模型、基于DenseNet+CTC的文字识别模型的步骤包括:The steps for training a text detection model based on DBnet and a text recognition model based on DenseNet+CTC include:
使用公共数据集,该数据集包含共约364万张图片,按照99:1划分成训练集和验证集,数据利用中文语料库(新闻+文言文),通过字体、大小、灰度、模糊、透视、拉伸等变化随机生成,包含汉字、英文字母、数字和标点共5990个字符,每个样本固定10个字符,字符随机截取自语料库中的句子,图片分辨率统一为280×3;Using a public data set, the data set contains a total of about 3.64 million images, divided into a training set and a verification set according to 99:1. The data uses the Chinese corpus (news + classical Chinese), through font, size, grayscale, blur, perspective, Stretching and other changes are randomly generated, including Chinese characters, English letters, numbers and punctuation, a total of 5990 characters. Each sample has a fixed 10 characters. The characters are randomly intercepted from sentences in the corpus, and the image resolution is unified to 280×3;
利用文字合成程序合成常用药品的文本图片以增加训练数据量;Use text synthesis programs to synthesize text images of commonly used drugs to increase the amount of training data;
加载在其他大型文本识别数据集上训练的预训练模型;Load pre-trained models trained on other large text recognition datasets;
利用数据集对DBnet的文字检测模型、基于DenseNet+CTC的文字识别模型进行训练,直至模型收敛。Use the data set to train the text detection model of DBnet and the text recognition model based on DenseNet+CTC until the model converges.
S600、将文字识别结果的每个单词或短语依次在药品数据库中进行低阈值的模糊搜索与匹配,模糊搜索与匹配的内容为药品的药名、品牌、厂家、剂量属性,得到一种或者多种药品编号。S600. Perform low-threshold fuzzy search and matching on each word or phrase of the text recognition result in the drug database in sequence. The contents of the fuzzy search and matching are the drug name, brand, manufacturer, and dosage attributes of the drug. One or more Drug number.
考虑药盒文字识别可能出现识别的文字不全的情况,模糊匹配的匹配阈值较低,所述低阈值为0.5,即包含一半相同字符则视为匹配成功;文字识别可以获得多个单词或语句,在药品数据库中进行低阈值的模糊搜索与匹配,模糊搜索与匹配的内容首先为药品的名称,若药品名称匹配通过,则在已匹配的药品中继续匹配厂家、剂量等属性,若其中一项属性没有匹配到任何一种药品,视为匹配失败;匹配失败不会缩小匹配范围,会直接跳过该项属性的匹配;最后通过模糊搜索与匹配可以得到一种或者多种药品编号。Considering that pill box text recognition may not recognize all the text, the matching threshold of fuzzy matching is low. The low threshold is 0.5, that is, if it contains half of the same characters, it is considered a successful match; text recognition can obtain multiple words or sentences. Perform low-threshold fuzzy search and matching in the drug database. The content of fuzzy search and matching is first the name of the drug. If the drug name is matched, the manufacturer, dosage and other attributes of the matched drugs will continue to be matched. If one of the If the attribute does not match any drug, it is considered a match failure; a match failure will not narrow the matching scope, and will directly skip the matching of the attribute; finally, one or more drug numbers can be obtained through fuzzy search and matching.
S700、将基于外观识别的药品唯一编号在基于文字识别所查询到的药品编号中查找:S700. Search the unique drug number based on appearance recognition in the drug number queried based on text recognition:
将基于外观识别的药品唯一编号在基于文字识别所查询到的药品编号中查找,若查找成功,则直接输出药盒识别结果。The unique drug number based on appearance recognition is searched among the drug numbers queried based on text recognition. If the search is successful, the pill box identification result is directly output.
S800、将基于外观识别的药品唯一编号在基于文字识别所查询到的药品编号中查找,若查找失败则将文字识别结果的每个单词或短语依次在药品数据库中进行智能匹配,最终获得药盒识别结果:S800. Search the unique drug number based on appearance recognition in the drug number queried based on text recognition. If the search fails, intelligently match each word or phrase of the text recognition result in the drug database in turn, and finally obtain the medicine box. Recognition results:
考虑到因基于外观识别的结果错误,而在文字识别获得的结果中查找不到任何药品的情况;此时,完全依靠文字识别确认药盒的信息;首先将文字识别结果的每个单词或短语依次在药品数据库中利用智能匹配算法匹配药品的名称属性;获得匹配的药品后,在这些药品下,依次利用智能匹配算法匹配品牌、厂家、剂量等属性,最终获得一种药品编号;Considering that no medicine can be found in the results obtained by text recognition due to errors in appearance-based recognition results; at this time, text recognition is completely relied on to confirm the information of the pill box; first, each word or phrase of the text recognition result is The intelligent matching algorithm is used to match the name attributes of the drugs in the drug database in turn; after obtaining the matching drugs, the intelligent matching algorithm is used to match the brand, manufacturer, dosage and other attributes of these drugs, and finally a drug number is obtained;
所述智能匹配算法如下:The intelligent matching algorithm is as follows:
首先匹配的是药品名称属性,初始匹配阈值为1,即完全相同则视为匹配成功。若出现任何一种药品都没有匹配到情况,则视为匹配失败。若匹配失败,则下降匹配阈值0.1,直至匹配成功;首次阈值不为1的匹配成功后,依次提高0.01的匹配阈值,直至匹配失败;匹配失败后,将匹配失败前一次匹配成功的结果视为最终匹配结果。The first thing to match is the drug name attribute. The initial matching threshold is 1, that is, if they are identical, the match is considered successful. If there is no match for any drug, it will be deemed that the matching failed. If the match fails, the matching threshold is lowered by 0.1 until the match is successful; after the first successful match with a threshold value other than 1, the matching threshold is increased by 0.01 until the match fails; after the match fails, the result of the successful match before the failed match is regarded as final matching result.
本发明实施例还提供了一种基于深度学习的药盒实时识别系统,包括图像采集装置和数据处理装置,该系统能够实现上述实施例所述的基于深度学习的药盒实时识别方法。Embodiments of the present invention also provide a real-time pill box recognition system based on deep learning, including an image acquisition device and a data processing device. The system can implement the deep learning-based real-time pill box identification method described in the above embodiment.
所述图像采集装置用于采集药盒的外观图像信息。The image acquisition device is used to collect appearance image information of the medicine box.
所述数据处理装置用于实现基于yolov5和文字识别对采集的药盒外观图像信息进行实时识别,数据处理装置包括基于yolov5的药盒识别模块、基于DBNet的文字检测模块、基于DenseNet和 CTC的文字识别模块及基于文字识别和外观识别的药盒识别模块。The data processing device is used to realize real-time recognition of the collected pill box appearance image information based on yolov5 and text recognition. The data processing device includes a pill box recognition module based on yolov5, a text detection module based on DBNet, and text based on DenseNet and CTC. Recognition module and pill box recognition module based on text recognition and appearance recognition.
在药盒的上方架设摄像机,采集药盒的外观图像信息的步骤之前包括:The steps of setting up a camera above the pill box to collect the appearance image information of the pill box include:
在利用神经网络模型进行药盒外观识别和文字识别之前需要对所有模型进行训练,包括基于yolov5的药盒外观图像识别模型、基于DBnet的文字检测模型、基于DenseNet+CTC的文字识别模型。Before using the neural network model for pill box appearance recognition and text recognition, all models need to be trained, including the pill box appearance image recognition model based on yolov5, the text detection model based on DBnet, and the text recognition model based on DenseNet+CTC.
在药盒的上方架设摄像机,采集药盒的外观图像信息步骤包括:Set up a camera above the pill box to collect the appearance image information of the pill box. The steps include:
将药盒放置在药盒信息采集箱内,上方架设摄像头,拍摄药盒的俯视图片,所述药盒信息采集箱的内壁采用不反光的黑色材料,并提供稳定光源。The medicine box is placed in the medicine box information collection box, and a camera is set up above to take a bird's-eye view of the medicine box. The inner wall of the medicine box information collection box is made of non-reflective black material and provides a stable light source.
用预先训练好的基于yolov5的药盒识别模型,检测和识别药盒,获得药盒边缘和基于外观的药盒的药品唯一编号;Use the pre-trained pill box recognition model based on yolov5 to detect and identify the pill box, and obtain the edge of the pill box and the unique number of the medicine based on the appearance of the pill box;
按照获取的药盒边缘对原图裁剪,获取每张图片只含单一药盒的多张图片;Crop the original image according to the edge of the obtained pill box, and obtain multiple pictures in which each picture only contains a single pill box;
对每张已剪裁的药盒图片,通过预训练好的基于DBNet的文字检测模型进行文字检测,获取药盒文字的区域;For each cropped pill box picture, text detection is performed through the pre-trained DBNet-based text detection model to obtain the area of the pill box text;
对于识别到的文字区域,利用预先训练好的基于DenseNet和 CTC的文字识别模型进行文字识别,识别出药盒的文字信息;For the recognized text area, use the pre-trained text recognition model based on DenseNet and CTC to perform text recognition and identify the text information of the pill box;
将文字识别结果的每个单词或短语依次在药品数据库中进行低阈值的模糊搜索与匹配,模糊搜索与匹配的内容为药品的药名、品牌、厂家、剂量属性,得到一种或者多种药品编号;Each word or phrase of the text recognition result is sequentially subjected to low-threshold fuzzy search and matching in the drug database. The content of fuzzy search and matching is the drug name, brand, manufacturer, and dosage attributes of the drug, and one or more drugs are obtained. serial number;
将基于外观识别的药品唯一编号在基于文字识别所查询到的药品编号中查找;若查找成功,输出识别结果;Search the unique drug number based on appearance recognition among the drug numbers queried based on text recognition; if the search is successful, output the recognition result;
若查找失败则将文字识别结果的每个单词或短语依次在药品数据库中进行智能模糊搜索与匹配,最终获得药盒识别结果。If the search fails, each word or phrase of the text recognition result will be sequentially searched and matched intelligently in the drug database, and finally the pill box recognition result will be obtained.
基于yolov5的药盒外观图像识别模型的训练包括以下步骤:The training of the pill box appearance image recognition model based on yolov5 includes the following steps:
将药盒放置在药盒信息采集箱内,上方架设一个可活动的摄像机,所述药盒信息采集箱的内壁采用不反光的黑色材料,并提供稳定光源;The pill box is placed in the pill box information collection box, and a movable camera is set up above. The inner wall of the pill box information collection box is made of non-reflective black material and provides a stable light source;
摄像机按照俯视的角度对所有要收集的药品进行拍照,获得药盒充足的外观图片,为保证训练质量,每个药盒采集的有效图片不应低于20张;The camera takes pictures of all the medicines to be collected from a bird's-eye view to obtain sufficient appearance pictures of the medicine boxes. To ensure the quality of training, the effective pictures collected for each medicine box should not be less than 20;
通过标签标注工具,将获取的药盒图片进行区域和类别的标注;Use the labeling tool to label the obtained pill box images by region and category;
加载yolov5s预训练模型;Load yolov5s pre-trained model;
在训练过程中通过mosaic数据增强等方式对药盒的外观数据进行数据增强,在经过适当的训练轮次后,直至基于yolov5的神经网络模型达到收敛。During the training process, the appearance data of the pill box is data enhanced through mosaic data enhancement and other methods. After appropriate training rounds, the neural network model based on yolov5 reaches convergence.
训练基于DBnet的文字检测模型、基于DenseNet+CTC的文字识别模型的步骤包括:The steps for training a text detection model based on DBnet and a text recognition model based on DenseNet+CTC include:
使用公共数据集,该数据集包含共约364万张图片,按照99:1划分成训练集和验证集,数据利用中文语料库(新闻+文言文),通过字体、大小、灰度、模糊、透视、拉伸等变化随机生成,包含汉字、英文字母、数字和标点共5990个字符,每个样本固定10个字符,字符随机截取自语料库中的句子,图片分辨率统一为280×3;Use a public data set, which contains a total of about 3.64 million images, divided into a training set and a verification set according to 99:1. The data uses the Chinese corpus (news + classical Chinese), through font, size, grayscale, blur, perspective, Stretching and other changes are randomly generated, including Chinese characters, English letters, numbers and punctuation, a total of 5990 characters. Each sample has a fixed 10 characters. The characters are randomly intercepted from sentences in the corpus, and the image resolution is unified to 280×3;
利用文字合成程序合成常用药品的文本图片以增加训练数据量;Use text synthesis programs to synthesize text images of commonly used drugs to increase the amount of training data;
加载在其他大型文本识别数据集上训练的预训练模型;Load pre-trained models trained on other large text recognition datasets;
利用数据集对DBnet的文字检测模型、基于DenseNet+CTC的文字识别模型进行训练,直至模型收敛。Use the data set to train the text detection model of DBnet and the text recognition model based on DenseNet+CTC until the model converges.
利用预先训练好的基于yolov5的药盒识别模型,检测和识别药盒,获得药盒边缘和基于外观的药盒识别结果中的yolov5神经网络模型采用yolov5s进行模型构建,由输入端(Input)、主干网络(Backbone)、颈部(Neck)和头部(Head)和输出端(Output)五部分组成;Use the pre-trained pill box recognition model based on yolov5 to detect and identify pill boxes, and obtain the edge of the pill box and the appearance-based pill box identification results. The yolov5 neural network model in the results uses yolov5s for model construction, which consists of the input terminal (Input), It consists of five parts: Backbone, Neck, Head and Output;
输入端输入的图像尺寸为640×640×3,采用Mosaic数据增强、Anchor自适应锚框计算和图像缩放等策略对图像进行预处理;The image size input at the input end is 640×640×3, and strategies such as Mosaic data enhancement, Anchor adaptive anchor frame calculation and image scaling are used to preprocess the image;
在yolov5中使用CSPDarknet53作为模型的主干网络,包括Focus模块、Conv模块、C3模块和SPP模块,其作用是从输入图像中提取丰富的语义特征;颈部采用FPN和PAN生成特征金字塔,用来增强对多尺度目标的检测;头部是对从颈部传递来的特征进行预测,并生成3个不同尺度的特征图;In yolov5, CSPDarknet53 is used as the backbone network of the model, including Focus module, Conv module, C3 module and SPP module. Its function is to extract rich semantic features from the input image; FPN and PAN are used in the neck to generate feature pyramids for enhancement. Detection of multi-scale targets; the head predicts the features passed from the neck and generates feature maps of three different scales;
Conv模块的结构为Conv2d+BN+SiLU,依次是卷积层、归一化操作和激活函数;Focus模块的目的是减少模型的计算量,加快网络的训练速度;首先将输入大小为3×640×640的图像切分成4个切片,其中每个切片的大小为3×320×320;然后使用拼接操作将4个切片通过通道维度拼接起来,得到的特征图尺度为12×320×320;再经过一次卷积操作,最终得32×320×320的特征图;The structure of the Conv module is Conv2d+BN+SiLU, followed by the convolution layer, normalization operation and activation function; the purpose of the Focus module is to reduce the calculation amount of the model and speed up the training speed of the network; first, the input size is 3×640 The image of After a convolution operation, the final feature map is 32×320×320;
C3模块由两个分支组成,在第一条分支中输入的特征图要通过3个连续的Conv模块和多个堆叠的Bottleneck模块;在第二条分支中,特征图仅通过一个Conv模块,最终将两个分支按通道拼接在一起;其中,Bottleneck模块可更好地提取目标的高级特征,主要由两个连续的卷积操作和一个残差操作组成;The C3 module consists of two branches. In the first branch, the input feature map passes through 3 consecutive Conv modules and multiple stacked Bottleneck modules; in the second branch, the feature map only passes through one Conv module, and finally The two branches are spliced together by channel; among them, the Bottleneck module can better extract high-level features of the target, and mainly consists of two consecutive convolution operations and a residual operation;
SPP模块是空间金字塔池化模块,用来扩大网络的感受野;在yolov5s中SPP模块的输入特征图大小为512×20×20,通过一个Conv模块后通道数减半;然后对特征图使用卷积核分别为5×5、9×9、13×13的最大池化操作,并将3种特征图与输入特征图按通道拼接后再通过一个Conv模块,最终输出的特征图大小为512×20×20。The SPP module is a spatial pyramid pooling module, used to expand the receptive field of the network; in yolov5s, the input feature map size of the SPP module is 512×20×20, and the number of channels is halved after passing a Conv module; then the volume is used on the feature map The accumulation kernels are maximum pooling operations of 5×5, 9×9, and 13×13 respectively. The three feature maps and the input feature map are spliced by channel and then passed through a Conv module. The final output feature map size is 512× 20×20.
通过预训练好的基于DBNet的文字检测模型进行文字检测中的DBNet的文字检测模型由DBNet文本检测网络组成,DBNet主要使用可微分二值化作为基于简单分割网络的文本检测器。可微分二值化公式如式1所示:The DBNet text detection model in text detection is performed through the pre-trained DBNet-based text detection model. The DBNet text detection model consists of the DBNet text detection network. DBNet mainly uses differentiable binarization as a text detector based on a simple segmentation network. The differentiable binarization formula is shown in Equation 1:
(1) (1)
其中B表示近似的二值图,T是网络学习的阈值特征图,k表示放大倍数;这里取经验值50,这样就可以更好地区分前景与背景。Among them, B represents the approximate binary image, T is the threshold feature map of network learning, and k represents the magnification factor; here, the empirical value of 50 is taken, so that the foreground and background can be better distinguished.
利用预先训练好的基于DenseNet和 CTC的文字识别模型进行文字识别的DenseNet和 CTC的文字识别模型由DenseNet神经网络由CNN层、RNN层以及CTC转录层组成;The text recognition model of DenseNet and CTC that uses pre-trained text recognition models based on DenseNet and CTC for text recognition consists of a DenseNet neural network composed of a CNN layer, an RNN layer, and a CTC transcription layer;
文字识别模型各层从下到上,功能如下:卷积层(CNN)来提取输入图像的特征;循环层(RNN)来预测CNN层输入特征序列的分布;转录层(CTC)来过滤序列标签的冗余数据;The functions of each layer of the text recognition model from bottom to top are as follows: the convolutional layer (CNN) to extract the features of the input image; the recurrent layer (RNN) to predict the distribution of the input feature sequence of the CNN layer; the transcription layer (CTC) to filter the sequence tags redundant data;
DenseNet网络选用Relu作为激活函数,使用了3个Dense Block 层进行演算,各个Dense Block之间通过Transition结构连接在一起组成的DenseNet网络,配合CTC_loss进行训练并得出最终的数据模型;The DenseNet network uses Relu as the activation function and uses 3 Dense Block layers for calculation. The DenseNet network composed of each Dense Block connected together through the Transition structure is trained with CTC_loss to obtain the final data model;
DenseNet网络使用跳跃拼接保留原本的特征,降低了梯度消失现象的发生,然而由于网络深度不断加深,导致通道数与参数量增多,使得模型很难提取深层次的特征,因此DenseNet设置了Transition转换模块;该模块用于Dense Block之后,主要用来减少通道数;同时为了使通道数更少,在每次DenseBlock拼接前都添加瓶颈结构,使通道数更少;通过池化减少参数后传送给下层的Dense Block结构,从而达到较高的精度。The DenseNet network uses skip splicing to retain the original features and reduce the occurrence of gradient disappearance. However, as the network depth continues to deepen, the number of channels and parameters increase, making it difficult for the model to extract deep features, so DenseNet sets up a Transition conversion module ;After this module is used in Dense Block, it is mainly used to reduce the number of channels; at the same time, in order to reduce the number of channels, a bottleneck structure is added before each DenseBlock splicing to reduce the number of channels; parameters are reduced through pooling and then transmitted to the lower layer. Dense Block structure to achieve higher accuracy.
将文字识别结果的每个单词或短语依次在药品数据库中进行低阈值的模糊搜索与匹配,模糊搜索与匹配的内容为药品的名称、厂家、剂量等属性,得到一种或者多种药品编号的步骤为:Each word or phrase of the text recognition result is sequentially subjected to low-threshold fuzzy search and matching in the drug database. The content of fuzzy search and matching is the name, manufacturer, dosage and other attributes of the drug, and one or more drug numbers are obtained. The steps are:
考虑药盒文字识别可能出现识别的文字不全的情况,模糊匹配的匹配阈值较低,所述低阈值为0.5,即包含一半相同字符则视为匹配成功;文字识别可以获得多个单词或语句,在药品数据库中进行低阈值的模糊搜索与匹配,模糊搜索与匹配的内容首先为药品的名称,若药品名称匹配通过,则在已匹配的药品中继续匹配厂家、剂量等属性,若其中一项属性没有匹配到任何一种药品,视为匹配失败;匹配失败不会缩小匹配范围,会直接跳过该项属性的匹配;最后通过模糊搜索与匹配可以得到一种或者多种药品编号。Considering that pill box text recognition may not recognize all the text, the matching threshold of fuzzy matching is low. The low threshold is 0.5, that is, if it contains half of the same characters, it is considered a successful match; text recognition can obtain multiple words or sentences. Perform low-threshold fuzzy search and matching in the drug database. The content of fuzzy search and matching is first the name of the drug. If the drug name is matched, the manufacturer, dosage and other attributes of the matched drugs will continue to be matched. If one of the If the attribute does not match any drug, it is considered a match failure; a match failure will not narrow the matching scope, and will directly skip the matching of the attribute; finally, one or more drug numbers can be obtained through fuzzy search and matching.
将基于外观识别的药品唯一编号在基于文字识别所查询到的药品编号中查找,若查找成功,则直接输出药盒识别结果;若查找失败则将文字识别结果的每个单词或短语依次在药品数据库中进行智能匹配,最终获得药盒识别结果:Search the unique number of the drug based on appearance recognition in the drug number queried based on text recognition. If the search is successful, the pill box recognition result will be output directly; if the search fails, each word or phrase of the text recognition result will be added to the drug in turn. Intelligent matching is performed in the database, and the pill box identification result is finally obtained:
考虑到因基于外观识别的结果错误,而在文字识别获得的结果中查找不到任何药品的情况;此时,完全依靠文字识别确认药盒的信息;首先将文字识别结果的每个单词或短语依次在药品数据库中利用智能匹配算法匹配药品的名称属性;获得匹配的药品后,在这些药品下,依次利用智能匹配算法匹配品牌、厂家、剂量等属性,最终获得一种药品编号;Considering that no medicine can be found in the results obtained by text recognition due to errors in appearance-based recognition results; at this time, text recognition is completely relied on to confirm the information of the pill box; first, each word or phrase of the text recognition result is The intelligent matching algorithm is used to match the name attributes of the drugs in the drug database in turn; after obtaining the matching drugs, the intelligent matching algorithm is used to match the brand, manufacturer, dosage and other attributes of these drugs, and finally a drug number is obtained;
所述智能匹配算法如下:The intelligent matching algorithm is as follows:
首先匹配的是药品名称属性,初始匹配阈值为1,即完全相同则视为匹配成功。若出现任何一种药品都没有匹配到情况,则视为匹配失败。若匹配失败,则下降匹配阈值0.1,直至匹配成功;首次阈值不为1的匹配成功后,依次提高0.01的匹配阈值,直至匹配失败;匹配失败后,将匹配失败前一次匹配成功的结果视为最终匹配结果。The first thing to match is the drug name attribute. The initial matching threshold is 1, that is, if they are identical, the match is considered successful. If there is no match for any drug, it will be deemed that the matching failed. If the match fails, the matching threshold is lowered by 0.1 until the match is successful; after the first successful match with a threshold value other than 1, the matching threshold is increased by 0.01 until the match fails; after the match fails, the result of the successful match before the failed match is regarded as final matching result.
尽管已描述了本发明的优选实例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实例以及落入本发明范围的所有变更和修改。显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Although preferred examples of the invention have been described, those skilled in the art will be able to make additional changes and modifications to these examples once the basic inventive concepts are apparent. Therefore, it is intended that the appended claims be construed to include the preferred examples and all changes and modifications that fall within the scope of the invention. Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and equivalent technologies, the present invention is also intended to include these modifications and variations.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310718494.6A CN116758525B (en) | 2023-06-16 | 2023-06-16 | A real-time medicine box recognition method and system based on deep learning |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310718494.6A CN116758525B (en) | 2023-06-16 | 2023-06-16 | A real-time medicine box recognition method and system based on deep learning |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN116758525A true CN116758525A (en) | 2023-09-15 |
| CN116758525B CN116758525B (en) | 2025-08-08 |
Family
ID=87950899
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310718494.6A Active CN116758525B (en) | 2023-06-16 | 2023-06-16 | A real-time medicine box recognition method and system based on deep learning |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116758525B (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117373602A (en) * | 2023-10-13 | 2024-01-09 | 北京百度网讯科技有限公司 | Medical record generation method and device |
| CN117370596A (en) * | 2023-10-13 | 2024-01-09 | 北京百度网讯科技有限公司 | Medicine knowledge retrieval method and device |
| US12272044B2 (en) | 2021-04-07 | 2025-04-08 | Optum, Inc. | Production line conformance measurement techniques using categorical validation machine learning models |
| TWI891171B (en) * | 2023-12-18 | 2025-07-21 | 所羅門股份有限公司 | Object identification method and system and computer program product |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180060701A1 (en) * | 2016-08-31 | 2018-03-01 | Adobe Systems Incorporated | Deep-learning network architecture for object detection |
| CN111339249A (en) * | 2020-02-20 | 2020-06-26 | 齐鲁工业大学 | A deep intelligent text matching method and device combining multi-angle features |
| CN111985574A (en) * | 2020-08-31 | 2020-11-24 | 平安医疗健康管理股份有限公司 | Medical image recognition method, device, equipment and storage medium |
| WO2022227218A1 (en) * | 2021-04-30 | 2022-11-03 | 平安科技(深圳)有限公司 | Drug name recognition method and apparatus, and computer device and storage medium |
-
2023
- 2023-06-16 CN CN202310718494.6A patent/CN116758525B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180060701A1 (en) * | 2016-08-31 | 2018-03-01 | Adobe Systems Incorporated | Deep-learning network architecture for object detection |
| CN111339249A (en) * | 2020-02-20 | 2020-06-26 | 齐鲁工业大学 | A deep intelligent text matching method and device combining multi-angle features |
| CN111985574A (en) * | 2020-08-31 | 2020-11-24 | 平安医疗健康管理股份有限公司 | Medical image recognition method, device, equipment and storage medium |
| WO2022227218A1 (en) * | 2021-04-30 | 2022-11-03 | 平安科技(深圳)有限公司 | Drug name recognition method and apparatus, and computer device and storage medium |
Non-Patent Citations (1)
| Title |
|---|
| 蒋良卫;黄玉柱;邓芙蓉;: "基于深度学习技术的图片文字提取技术的研究", 信息系统工程, no. 03, 20 March 2020 (2020-03-20) * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12272044B2 (en) | 2021-04-07 | 2025-04-08 | Optum, Inc. | Production line conformance measurement techniques using categorical validation machine learning models |
| CN117373602A (en) * | 2023-10-13 | 2024-01-09 | 北京百度网讯科技有限公司 | Medical record generation method and device |
| CN117370596A (en) * | 2023-10-13 | 2024-01-09 | 北京百度网讯科技有限公司 | Medicine knowledge retrieval method and device |
| TWI891171B (en) * | 2023-12-18 | 2025-07-21 | 所羅門股份有限公司 | Object identification method and system and computer program product |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116758525B (en) | 2025-08-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113705733B (en) | Medical bill image processing method and device, electronic device, and storage medium | |
| CN116758525A (en) | Medicine box real-time identification method and system based on deep learning | |
| CN114937176B (en) | A real-time drug identification method and system based on deep learning | |
| US20120290988A1 (en) | Multifaceted Visualization for Topic Exploration | |
| CN112686258A (en) | Physical examination report information structuring method and device, readable storage medium and terminal | |
| CN107239731A (en) | A kind of gestures detection and recognition methods based on Faster R CNN | |
| CN114863408B (en) | Document content classification method, system, device and computer-readable storage medium | |
| WO2022227218A1 (en) | Drug name recognition method and apparatus, and computer device and storage medium | |
| Ou et al. | Automatic drug pills detection based on convolution neural network | |
| TWI723868B (en) | Method for applying a label made after sampling to neural network training model | |
| CN113947776A (en) | Method and device for determining structured prescription information of prescription image | |
| CN114419391B (en) | Target image recognition method and device, electronic device and readable storage medium | |
| CN114638973A (en) | Target image detection method and image detection model training method | |
| CN117954045A (en) | Automatic medicine sorting management system and method based on prescription data analysis | |
| CN118097688A (en) | A general document recognition method based on large language model | |
| Maitrichit et al. | Intelligent medicine identification system using a combination of image recognition and optical character recognition | |
| CN115063784A (en) | Bill image information extraction method and device, storage medium and electronic equipment | |
| CN110298841A (en) | A kind of Image Multiscale semantic segmentation method and device based on converged network | |
| Dahl et al. | Applications of machine learning in tabular document digitisation | |
| CN118587730A (en) | An optical character recognition method for medical images | |
| CN114332574A (en) | Image processing method, device, device and storage medium | |
| CN115908464B (en) | Tongue image segmentation method and system | |
| CN115731556A (en) | Image processing method and device, electronic equipment and readable storage medium | |
| CN114780773A (en) | Document and picture classification method and device, storage medium and electronic equipment | |
| CN120219238A (en) | A method, system, terminal and storage medium for repairing cultural relic fragments of different origins |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |