CN118428615A

CN118428615A - Interactive system for scheduling and dispatching tax personnel

Info

Publication number: CN118428615A
Application number: CN202310049548.4A
Authority: CN
Inventors: 袁林萍; 屈华民; 张�荣
Original assignee: Hong Kong University of Science and Technology
Current assignee: Hong Kong University of Science and Technology
Priority date: 2023-02-01
Filing date: 2023-02-01
Publication date: 2024-08-02

Abstract

The present disclosure provides an interactive system for tax manager to schedule tax staff to effectively perform daily tax activities. Due to the availability of historical data, the system consists of three modules: the system comprises a scheduling module, a scheduling module and a user interface. The scheduling module enables tax manager to schedule enough tax staff in advance by predicting future application counts based on an unstable historical time series using a time series prediction algorithm. The scheduling module can be matched with the application and tax staff in real time, so that the purposes of balancing workload, reducing waiting time and reducing total processing time are achieved. The user interface provides intuitive interactions and visualizations that allow a non-technical tax manager to control algorithms and check scheduling and scheduling results.

Description

Interactive system for scheduling and dispatching tax personnel

技术领域Technical Field

本公开涉及一种交互系统，并且具体地，涉及用于税务机关的工作人员排班和调度的交互系统。The present disclosure relates to an interactive system, and in particular, to an interactive system for staff scheduling and dispatching of a tax authority.

背景技术Background technique

征收和纳税是重要的公共事务。随着近年来信息系统的发展，纳税人可以以多种方式提交税务申请，例如通过访问税务服务中心或者通过智能手机或笔记本电脑在家里上传文件。这些申请被分配给税务人员并由税务人员处理。中国税务有两个特点。首先，由于大的税基，每天有大量的申请。其次，各种税务类型需要不同的相关知识来处理。Tax collection and payment are important public affairs. With the development of information systems in recent years, taxpayers can submit tax applications in a variety of ways, such as by visiting a tax service center or uploading documents at home via a smartphone or laptop. These applications are assigned to and processed by tax officials. There are two characteristics of Chinese taxation. First, due to the large tax base, there are a large number of applications every day. Second, various tax types require different relevant knowledge to handle.

这些特点对税务机关提供高效服务提出了两个主要挑战。首先，当许多纳税人同时通过各种渠道请求服务时，税务人员可能仅承担一些申请，导致长队列或者甚至将它们推迟到另一天。其次，由于税务类型复杂且多样，因此税务人员具有不同水平的专业知识和熟练程度。申请可能被分配给需要更多能力或知识来处理它的工作人员，从而导致更长的处理时间和更低的服务质量。招募更多的税务人员可以更好地服务纳税人，但是太多的税务人员可能浪费人力资源并且增加培训和管理成本。These characteristics pose two major challenges to the tax authorities in providing efficient services. First, when many taxpayers request services through various channels at the same time, tax officers may only take on some applications, resulting in long queues or even postponing them to another day. Second, because tax types are complex and diverse, tax officers have different levels of expertise and proficiency. An application may be assigned to a staff member who needs more capabilities or knowledge to handle it, resulting in longer processing time and lower service quality. Recruiting more tax officers can better serve taxpayers, but too many tax officers may waste human resources and increase training and management costs.

因此，工作人员排班和调度是必要的。税务机关需要适当安排员工的工作时间，以及与员工匹配申请，以更好地执行日常税务活动。然而，传统的手动或基于规律的排班和调度方法由于其缓慢的响应、不准确性和低智能性而不能应对日益复杂的税务方案。幸运的是，如今每个申请可以被跟踪并作为历史数据存储在数据库中。历史数据的可用性(例如日常申请计数和处理时间)使得开发一种系统以便管理者利用数据支持高效和灵活地排班和调度员工成为可能。然而，不清楚如何设计和利用这种系统以在税务机关实现数据辅助的排班和调度。Therefore, staff scheduling and dispatching are necessary. Tax authorities need to properly arrange employees' working hours, as well as match applications with employees, to better perform daily tax activities. However, traditional manual or rule-based scheduling and dispatching methods cannot cope with the increasingly complex tax schemes due to their slow response, inaccuracy, and low intelligence. Fortunately, nowadays each application can be tracked and stored in a database as historical data. The availability of historical data (such as daily application counts and processing times) makes it possible to develop a system for managers to use data to support efficient and flexible scheduling and dispatching of employees. However, it is unclear how to design and utilize such a system to achieve data-assisted scheduling and dispatching in tax authorities.

时间序列预测已经应用各种领域，例如生产计划、库存分析和空气污染。时间序列预测算法可以粗略地分类为两类：统计方法和深度学习方法。Time series forecasting has been applied in various fields, such as production planning, inventory analysis, and air pollution. Time series forecasting algorithms can be roughly classified into two categories: statistical methods and deep learning methods.

常用的统计时间序列预测模型包括自回归(AR)模型、移动平均(MA)模型、自回归移动平均(ARMA)模型和自回归积分移动平均(ARIMA)模型。这些模型基于当前时刻的观测值与先前时刻的观测值相关的假设，因此过去的变化趋势和模式可以用于预测未来的趋势和模式。这些算法易于使用，并且它们的预测结果易于解释。然而，它们也具有许多限制，例如它们取决于参数的选择；相比于非线性系列，它们更适合于对线性系列建模；它们要求历史数据是固定的或在求差之后是固定的(即，序列的统计特性不随时间改变)。Commonly used statistical time series forecasting models include autoregressive (AR) models, moving average (MA) models, autoregressive moving average (ARMA) models, and autoregressive integrated moving average (ARIMA) models. These models are based on the assumption that the observations at the current moment are related to the observations at the previous moment, so past trends and patterns can be used to predict future trends and patterns. These algorithms are easy to use, and their forecast results are easy to interpret. However, they also have many limitations, such as they depend on the choice of parameters; they are more suitable for modeling linear series than nonlinear series; they require historical data to be fixed or fixed after difference (that is, the statistical characteristics of the series do not change over time).

最近，研究人员已经尝试设计并应用深度学习方法来解决时间序列预测问题。例如，Lai等提出了LSTNet，其组合卷积神经网络(CNN)和RNN以从多元时间序列中提取短期和长期信息。此外，研究人员还在探索如何将转换器应用于时间序列预测。然而，实际生活中的时间序列通常是不稳定的。因此，一些最近的工作已经提出了用于不稳定时间序列数据的深度学习模型。AdaRNN是最先进的模型之一，并且提供了在迁移学习中基于域泛化对时间序列建模的新的观点。提出了时间序列协方差偏移问题，构建了两步自适应RNN模型。本公开采用AdaRNN来预测未来的税务申请计数，因为由于Covid-19的影响以及繁忙税务季节和普通季节之间的切换，税务机关的历史申请数据是不稳定的。Recently, researchers have attempted to design and apply deep learning methods to solve the problem of time series prediction. For example, Lai et al. proposed LSTNet, which combines convolutional neural networks (CNNs) and RNNs to extract short-term and long-term information from multivariate time series. In addition, researchers are also exploring how to apply transformers to time series prediction. However, time series in real life are usually unstable. Therefore, some recent works have proposed deep learning models for unstable time series data. AdaRNN is one of the most advanced models and provides a new perspective on time series modeling based on domain generalization in transfer learning. The time series covariance shift problem is proposed, and a two-step adaptive RNN model is constructed. The present disclosure adopts AdaRNN to predict future tax application counts because the historical application data of the tax authorities are unstable due to the impact of Covid-19 and the switch between busy tax seasons and ordinary seasons.

资源调度算法旨在协调资源需求者和服务提供者。在税务情况下，我们需要基于实时情况来匹配申请和税务人员。Resource scheduling algorithms are designed to coordinate resource demanders and service providers. In the case of taxation, we need to match applications with tax personnel based on real-time conditions.

调度问题被广泛地建模为多目标优化问题。解决多目标优化问题的挑战之一在于，目标通常彼此冲突，因此难以同时将每个目标优化为最优解。现有的解决方案分成两个一般类别：将该多重目标转换成单一目标，并产生一组帕累托(Pareto)解。第一类方案只能产生单一方案，因为它们解决单一目标优化问题。决策者可能更倾向于第二类别中的解，以获得一组帕累托解，然后进行他们自己的折衷。在一个目标中，帕累托最优解至少支配其他解。所提出的系统的目标用户(即税务管理者)希望得到多种解并自己做出决定。因此，本公开选择遗传算法作为起始点，其可以有效地返回一组帕累托解。遗传方法有许多变化，如MOGA、NPGA、WBGA、NSGA和NSGA-II。这些遗传算法在适合度、分配过程、精英化或多样化方法方面彼此不同。本公开应用遗传算法的思想来解决税务场景中的资源调度问题。Scheduling problems are widely modeled as multi-objective optimization problems. One of the challenges in solving multi-objective optimization problems is that the objectives are usually in conflict with each other, so it is difficult to optimize each objective to the optimal solution at the same time. Existing solutions are divided into two general categories: converting the multiple objectives into a single objective and generating a set of Pareto solutions. The first type of solution can only produce a single solution because they solve a single-objective optimization problem. Decision makers may prefer solutions in the second category to obtain a set of Pareto solutions and then make their own compromises. In one objective, the Pareto optimal solution at least dominates the other solutions. The target users of the proposed system (i.e., tax managers) want to get multiple solutions and make their own decisions. Therefore, the present disclosure selects a genetic algorithm as a starting point, which can effectively return a set of Pareto solutions. There are many variations of genetic methods, such as MOGA, NPGA, WBGA, NSGA, and NSGA-II. These genetic algorithms differ from each other in terms of fitness, allocation process, elitism, or diversification methods. The present disclosure applies the idea of genetic algorithms to solve resource scheduling problems in tax scenarios.

本公开将税务场景中员工排班和调度的领域问题，刻画成时间序列预测和资源调度问题，提出了一种员工排班和调度系统，以满足税务机关的需要。This paper characterizes the domain problem of employee scheduling and dispatching in tax scenarios as time series prediction and resource scheduling problems, and proposes an employee scheduling and dispatching system to meet the needs of tax authorities.

发明内容Summary of the invention

本公开是一种税务管理者排班和调度税务人员的交互系统。在给定大量申请和复杂处理程序的情况下，如何有效地排班和调度员工以向纳税人提供良好服务现在正受到税务机关的更多关注。历史申请数据的可用性使得税务管理者能够利用数据支持来排班和调度员工，但是不清楚如何适当地利用历史数据。因此，本公开提供了一种基于税务管理者需求的交互系统。该系统可以基于预测的申请计数利用时间序列预测算法来排班员工，利用遗传算法来实时调度员工，并且利用直观和交互式可视化来显示结果。更具体地说，该系统首先采用最先进的深度学习模型AdaRNN来预测未来的税务申请计数，并通知管理者在给定的周期内他们将接收每种税务类型的申请的数量。然后，基于结果和每个员工处理每个税务类型的申请的历史处理时间，系统提供未来一周中每天所需的员工数量。其次，系统基于遗传算法NSGA-II嵌入定制的实时资源调度算法。该算法连续地扫描未决的申请，并且以最平衡的工作负荷、最少的等待时间和最少的总处理时间为目标，最佳地匹配申请和员工。最后，该系统提供用户界面以进一步方便管理者通过上传历史数据、训练新模型以及用线图和根特图查看结果来排班和调度税务员工。The present disclosure is an interactive system for tax managers to schedule and dispatch tax personnel. Given a large number of applications and complex processing procedures, how to effectively schedule and dispatch employees to provide good services to taxpayers is now receiving more attention from tax authorities. The availability of historical application data enables tax managers to schedule and dispatch employees with data support, but it is not clear how to properly utilize historical data. Therefore, the present disclosure provides an interactive system based on the needs of tax managers. The system can schedule employees based on predicted application counts using a time series prediction algorithm, schedule employees in real time using a genetic algorithm, and display the results using intuitive and interactive visualization. More specifically, the system first uses the most advanced deep learning model AdaRNN to predict future tax application counts and inform managers of the number of applications they will receive for each tax type in a given period. Then, based on the results and the historical processing time of each employee processing each tax type of application, the system provides the number of employees required for each day in the next week. Secondly, the system embeds a customized real-time resource scheduling algorithm based on the genetic algorithm NSGA-II. The algorithm continuously scans pending applications and optimally matches applications and employees with the goal of the most balanced workload, the least waiting time, and the least total processing time. Finally, the system provides a user interface to further facilitate managers in scheduling and dispatching tax employees by uploading historical data, training new models, and viewing the results with line graphs and Ghent charts.

一方面，本公开提供一种用于税务人员排班和调度的交互系统，包括：排班模块，其被配置为获取历史数据，并基于所述历史数据利用时间序列预测算法来预测未来申请计数，以输出排班结果；调度模块，其被配置为接收输入信息，利用基于NSGA-II的资源调度算法，实时地将所述排班结果中的员工与将要处理的申请之间进行匹配，以生成调度结果；以及用户界面，其显示所述排班结果和所述调度结果。On the one hand, the present disclosure provides an interactive system for scheduling and dispatching tax personnel, including: a scheduling module, which is configured to obtain historical data and use a time series prediction algorithm based on the historical data to predict future application counts to output scheduling results; a scheduling module, which is configured to receive input information and use a resource scheduling algorithm based on NSGA-II to match employees in the scheduling results with applications to be processed in real time to generate scheduling results; and a user interface, which displays the scheduling results and the scheduling results.

在实施例中，所述时间序列预测算法包括SARIMA算法和AdaRNN深度学习算法中的一种。In an embodiment, the time series prediction algorithm includes one of a SARIMA algorithm and an AdaRNN deep learning algorithm.

在实施例中，所述时间序列预测算法包括AdaRNN深度学习算法。In an embodiment, the time series prediction algorithm comprises an AdaRNN deep learning algorithm.

在实施例中，所述历史数据包括：税务类型、每个员工处理所述税务类型的申请的平均处理时间和历史日常申请计数。In an embodiment, the historical data includes: tax type, average processing time for each employee to process applications of the tax type, and historical daily application counts.

在实施例中，所述历史日常申请计数是按照时间顺序对申请进行排列的不稳定的时间序列。In an embodiment, the historical daily application count is a non-stationary time series that arranges applications in chronological order.

在实施例中，所述输入信息包括实时的员工的可用性、当前待处理的按照时间顺序排列的申请、以及每个员工对所述税务类型的熟练程度。In an embodiment, the input information includes real-time employee availability, currently pending applications in chronological order, and each employee's proficiency in the tax type.

在实施例中，所述调度结果包括分别实现以下目标中的至少一个的多个调度结果：最小化员工的工作时间之间的差异；最小化总处理时间；以及最小化纳税人的总等待时间。In an embodiment, the scheduling result includes a plurality of scheduling results that respectively achieve at least one of the following goals: minimizing the difference between employees' working hours; minimizing the total processing time; and minimizing the total waiting time of taxpayers.

在实施例中，在所述基于NSGA-II的资源调度算法中，执行以下处理：编码、群体初始化、交叉、变异、适应度计算、非支配排序、拥挤距离计算、选择、生成新群体、检查终止条件、返回多个解。In an embodiment, in the NSGA-II-based resource scheduling algorithm, the following processes are performed: encoding, population initialization, crossover, mutation, fitness calculation, non-dominated sorting, crowding distance calculation, selection, generating a new population, checking termination conditions, and returning multiple solutions.

在实施例中，所述用户界面包括：In an embodiment, the user interface comprises:

数据面板，其允许用户上传所述历史数据；A data panel that allows users to upload the historical data;

算法选择及设置面板，其允许用户选择SARIMA算法和AdaRNN深度学习算法中的一种以及设置相应参数；Algorithm selection and setting panel, which allows users to select one of the SARIMA algorithm and AdaRNN deep learning algorithm and set the corresponding parameters;

可视化面板，其以曲线图的形式显示历史日常申请计数和预测的未来申请计数；A visualization panel that displays historical daily application counts and predicted future application counts in the form of a line graph;

排班设置面板，其允许用户设置与排班相关的参数；The shift setting panel allows users to set parameters related to shift scheduling;

排班结果面板，其以线图的形式显示所述排班结果。The shift scheduling result panel displays the shift scheduling result in the form of a line graph.

在实施例中，所述与排班相关的参数包括每个员工的默认工作分钟和期望预测的天数。In an embodiment, the scheduling-related parameters include default working minutes for each employee and expected forecast days.

另一方面，本公开提供一种非暂时性计算机可读存储介质，其上存储有计算机程序，当所述计算机程序被处理器执行时，所述处理器执行以下操作：获取历史数据，并基于所述历史数据利用时间序列预测算法来预测未来申请计数，以输出排班结果；接收输入信息，利用基于NSGA-II的资源调度算法，实时地将所述排班结果中的员工与将要处理的申请之间进行匹配，以生成调度结果；以及将所述排班结果和所述调度结果向用户显示。On the other hand, the present disclosure provides a non-transitory computer-readable storage medium having a computer program stored thereon. When the computer program is executed by a processor, the processor performs the following operations: obtains historical data, and uses a time series prediction algorithm based on the historical data to predict future application counts to output a scheduling result; receives input information, and uses a resource scheduling algorithm based on NSGA-II to match employees in the scheduling result with applications to be processed in real time to generate a scheduling result; and displays the scheduling result and the scheduling result to a user.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

将从下面结合附图的详细描述中更清楚地理解示出性、非限制性示例实施例。Illustrative, non-limiting example embodiments will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings.

图1为根据本公开的实施例的交互系统的架构图。FIG. 1 is an architecture diagram of an interactive system according to an embodiment of the present disclosure.

图2为通过AdaRNN算法基于不稳定的历史时间序列预测未来申请计数的示意图。FIG2 is a schematic diagram of predicting future application counts based on unstable historical time series using the AdaRNN algorithm.

图3示出了在税务情况下解决资源调度问题并匹配纳税人的申请和合适的员工的定制的非支配排序遗传算法II的流程图。FIG. 3 shows a flow chart of a customized non-dominated sorting genetic algorithm II for solving the resource scheduling problem in a tax context and matching taxpayers' applications with appropriate employees.

图4示出了排班模块的用户界面。FIG4 shows the user interface of the scheduling module.

具体实施方式Detailed ways

现有技术已经设计了用于调度诊所、呼叫中心和普通工作场所的工作人员的系统或算法。然而，他们中没有一个将具有以下独特特征要考虑的排班和调度税务机关的工作人员的目标。Prior art has designed systems or algorithms for scheduling staff in clinics, call centers and general workplaces. However, none of them will have the following unique characteristics to consider for the purpose of scheduling and scheduling staff in tax offices.

首先，为了提前排班税务员工，税务管理者需要知道未来的预测申请计数，因此需要时间序列预测算法。由于Covid-19的影响以及繁忙税收季节与普通季节之间的转换，历史申请的时间序列是不稳定的。然而，现有技术中当排班工作人员时没有一个考虑了该特性。First, in order to schedule tax staff in advance, tax managers need to know the future forecast application counts, so a time series forecasting algorithm is needed. Due to the impact of Covid-19 and the transition between the busy tax season and the normal season, the time series of historical applications is unstable. However, none of the existing technologies take this feature into account when scheduling staff.

其次，为了调度税务人员，税务管理者希望适当地匹配税务员工和申请以实现最平衡的工作量、最少的等待时间和最少的处理时间的目标。他们还希望多个调度方案来选择处理不同的情况。例如，如果一天非常繁忙，则税务管理者可能牺牲平衡工作负荷的目标，并且向更熟练的员工分配更多的申请以追求高效和更少的纳税人等待时间。如果在给定的一天有较少的纳税人，税务管理者可能更关心平衡各个员工的工作时间以避免导致过度工作的不当行为。然而，现有技术主要提供了单个的方案。Secondly, in order to schedule tax personnel, tax managers want to properly match tax employees and applications to achieve the goals of the most balanced workload, the least waiting time, and the least processing time. They also want multiple scheduling schemes to choose from to handle different situations. For example, if the day is very busy, the tax manager may sacrifice the goal of balancing the workload and assign more applications to more skilled employees in pursuit of efficiency and less taxpayer waiting time. If there are fewer taxpayers on a given day, the tax manager may be more concerned about balancing the working hours of various employees to avoid improper behavior that leads to overwork. However, the prior art mainly provides a single scheme.

第三，利用算法获得排班和调度结果需要一定程度的用户参与，包括调整参数、比较结果、以及对多种调度方案进行排序，但是税务管理人员没有技术背景，不能直接利用排班和调度算法进行操作。需要直观的界面来桥接税务管理者和算法。然而，现有技术没有提供专用用户界面。Third, using algorithms to obtain scheduling and dispatching results requires a certain degree of user participation, including adjusting parameters, comparing results, and sorting multiple dispatching schemes, but tax managers do not have a technical background and cannot directly use scheduling and dispatching algorithms to operate. An intuitive interface is needed to bridge tax managers and algorithms. However, the existing technology does not provide a dedicated user interface.

本公开改进了现有技术，并利用以下特征满足了上述特性：The present disclosure improves the prior art and satisfies the above characteristics by utilizing the following features:

1.本公开设计和开发了一种专门的系统，用于税务管理者排班和调度工作人员。该系统可以预先预测不同税务类型的申请计数以排班员工，可以实时地将申请与合适的税务员工匹配以调度员工，并且提供直观的用户界面和交互以允许管理者获得结果。与以呼叫中心或其他工作人员为目标的现有技术相比，本公开通过考虑特定数据类型、数据特性和工作流程来适应税务情况。本公开能够处理不稳定的历史数据，为排班使用提供预测结果。1. The present disclosure designs and develops a specialized system for tax managers to schedule and dispatch staff. The system can predict the application counts of different tax types in advance to schedule staff, can match applications with appropriate tax staff in real time to schedule staff, and provides an intuitive user interface and interaction to allow managers to obtain results. Compared with existing technologies that target call centers or other staff, the present disclosure adapts to tax situations by considering specific data types, data characteristics, and workflows. The present disclosure is able to process unstable historical data and provide prediction results for scheduling use.

2.本公开适合遗传算法以满足税务情况下的特定特征，从而在多个目标下使申请与合适的税务人员匹配，并且为税务管理者提供若干可选方案。2. The present disclosure adapts genetic algorithms to meet specific characteristics in tax situations, thereby matching applications with appropriate tax personnel under multiple objectives and providing tax managers with several options.

3.本公开设计了具有数据可视化功能的用户界面，允许非技术税务管理者能够轻松地与高级算法进行交互，从而为非技术技术人员提供了用于排班和调度税务员工的集成的方案。3. The present disclosure designs a user interface with data visualization capabilities, allowing non-technical tax managers to easily interact with advanced algorithms, thereby providing non-technical technicians with an integrated solution for scheduling and dispatching tax employees.

图1示出了系统架构。首先处理历史数据以获得有用信息，例如历史日常申请计数的时间序列、税务类型、税务员工以及他们对于不同类型申请的平均处理时间。然后，开发预测申请计数并提供排班结果的排班模块、以及在工作时间实时地将可用的员工与申请匹配的调度模块。最后，将排班模块和调度模块与视觉用户界面相结合，该视觉用户界面显示具有直观可视化的结果。Figure 1 shows the system architecture. First, the historical data is processed to obtain useful information, such as the time series of historical daily application counts, tax types, tax employees, and their average processing time for different types of applications. Then, a scheduling module is developed to predict application counts and provide scheduling results, and a scheduling module is developed to match available employees with applications in real time during working hours. Finally, the scheduling module and scheduling module are combined with a visual user interface that displays the results with intuitive visualization.

排班模块Scheduling Module

排班模块旨在使管理者能够预先安排足够的职工，其能力和经验能够处理未来的申请。基于税务管理者的要求，排班模块需要考虑1)申请的税务类型，2)每个员工处理每个申请类型的处理时间，以及3)日常申请计数。这三个因素的不确定性是不同的。这里的不确定性指的是因素的值是否随时间变化以及它波动了多少。具体地，税务类型的不确定性是最小的，因为税务机构提供基本上相同的税务类型或者仅在很长一段时间之后才更新它们。处理时间稍微具有不确定性，因为它受到各种因素的影响，例如纳税人或员工是否更熟悉实践过程，或者他们是否匆忙。然而，假设熟练度将在短时段内保持稳定，因此，在实施例中，在排班模块中使用历史处理时间的平均值。在这三个因素中，最难以控制的是日常申请计数。与其他两个因素相比，由于排班模块的目标是安排未来的税务员工，因此知道申请的历史情况是不够的。因此，需要知道未来几天的申请计数。基于上述分析，概述了排班模块的设计。图1为根据本公开的实施例的交互系统的架构图。在实施例中，利用历史数据(A)和挖掘的信息(B)，交互系统支持税务管理者利用排班模块(C)、调度模块(D)和用户界面(E)排班和调度税务员工。如图1中的(C)所示，模块的输入是税务类型、平均处理时间和历史日常申请计数。第一步是预测未来申请计数。具体地，按照时间顺序排列历史申请计数以获得历史时间序列，然后使用适当的时间序列预测算法来预测未来申请。然后，基于税务类型、处理时间和预测的申请计数，模块输出排班结果(即，在未来的日子中将针对每个税务类型接收的申请的数量，以及具有所需的相关技能的相应数量的员工)。这个最困难和重要的步骤是时间序列预测，下面将详细说明。The scheduling module is intended to enable managers to pre-arrange enough employees whose capabilities and experience can handle future applications. Based on the requirements of tax managers, the scheduling module needs to consider 1) the tax type of application, 2) the processing time for each employee to handle each application type, and 3) the daily application count. The uncertainty of these three factors is different. The uncertainty here refers to whether the value of the factor changes over time and how much it fluctuates. Specifically, the uncertainty of the tax type is the smallest because the tax agency provides basically the same tax type or only updates them after a long period of time. The processing time is slightly uncertain because it is affected by various factors, such as whether the taxpayer or employee is more familiar with the practice process, or whether they are in a hurry. However, it is assumed that the proficiency will remain stable over a short period of time, so in an embodiment, the average value of the historical processing time is used in the scheduling module. Among these three factors, the most difficult to control is the daily application count. Compared with the other two factors, since the goal of the scheduling module is to schedule future tax employees, it is not enough to know the historical situation of the application. Therefore, it is necessary to know the application count for the next few days. Based on the above analysis, the design of the scheduling module is outlined. Figure 1 is an architecture diagram of an interactive system according to an embodiment of the present disclosure. In an embodiment, using historical data (A) and mined information (B), the interactive system supports tax managers to schedule and dispatch tax employees using a scheduling module (C), a scheduling module (D) and a user interface (E). As shown in (C) in Figure 1, the input of the module is the tax type, the average processing time and the historical daily application count. The first step is to predict the future application count. Specifically, the historical application counts are arranged in chronological order to obtain a historical time series, and then an appropriate time series prediction algorithm is used to predict future applications. Then, based on the tax type, processing time and predicted application count, the module outputs the scheduling result (i.e., the number of applications to be received for each tax type in the future days, and the corresponding number of employees with the required relevant skills). This most difficult and important step is time series prediction, which will be described in detail below.

不稳定的时间序列预测Unstable time series forecasting

当时间序列的统计特性(例如，均值和方差)随时间动态地改变时，认为时间序列是不稳定的。不稳定时间序列数据分布的变化可以是规律的或不规律的。由于税务繁忙季节与平常季节之间的规律转换以及Covid-19的不规律影响，税务情况下历史申请计数的时间序列是不稳定的。针对不稳定的时间序列使用统计方法(例如AR、MA和ARIMA)经常产生较差的预测性能。原因是这些模型假设时间序列具有正态分布，但是不稳定的时间序列违反了该假设。A time series is considered to be unstable when its statistical properties (e.g., mean and variance) change dynamically over time. The changes in the distribution of unstable time series data can be regular or irregular. The time series of historical application counts in tax cases is unstable due to the regular transition between busy and normal tax seasons and the irregular impact of Covid-19. Using statistical methods (e.g., AR, MA, and ARIMA) on unstable time series often produces poor forecasting performance. The reason is that these models assume that the time series has a normal distribution, but unstable time series violate this assumption.

为了解决该挑战，本申请采用了名为AdaRNN的最先进的深度学习算法来预测在税务季节和Covid-19的影响下的未来申请计数。图2示出了AdaRNN如何被适配。该算法首先使用时间分布特征(TDC)将原始数据分成时间段，然后使用具有GRU网络的时间分布匹配(TDM)来预测具有给定损失函数的未来值。具体地，AdaRNN将时间协变量偏移定义为不稳定序列中数据分布动态随时间变化的现象，并采用迁移学习方法。时间协变量偏移是协变量偏移的扩展，其是机器学习中的重要问题。如图3所示，时间协变量偏移发生在连续时间序列被分成K个片段时，并且每个片段的概率分布是不同的。时间分布特征化和时间分布匹配是求解时间协变量偏移的两个步骤。To address this challenge, this application uses a state-of-the-art deep learning algorithm called AdaRNN to predict future application counts under the impact of tax season and Covid-19. Figure 2 shows how AdaRNN is adapted. The algorithm first divides the original data into time periods using time distribution features (TDC), and then uses time distribution matching (TDM) with a GRU network to predict future values with a given loss function. Specifically, AdaRNN defines time covariate shift as a phenomenon in which the data distribution in an unstable sequence changes dynamically over time, and adopts a transfer learning method. Time covariate shift is an extension of covariate shift, which is an important problem in machine learning. As shown in Figure 3, time covariate shift occurs when a continuous time series is divided into K segments, and the probability distribution of each segment is different. Time distribution characterization and time distribution matching are two steps to solve time covariate shift.

时间分布特征化是指将整个时间序列数据分割成离散的K个片段，使得可以计算片段之间的概率分布的差异。假设如果训练模型可以减小较大的分布差异，则该训练模型将具有较高的泛化能力，时间分布特征化的目的是将时间序列划分为最不相似K个片段。因此，它通过贪婪(greedy)策略解决优化：Time distribution characterization refers to dividing the entire time series data into K discrete segments so that the difference in probability distribution between the segments can be calculated. Assuming that if the training model can reduce large distribution differences, the training model will have higher generalization ability, the purpose of time distribution characterization is to divide the time series into the most dissimilar K segments. Therefore, it solves the optimization through a greedy strategy:

其中dist指示两个片段的分布之间的距离，其可以以几种方式计算，例如余弦距离和MMD(参见A.Gretton,D.Sejdinovic,H.Strathmann,S.Balakrishnan,M.Pontil,K.Fukumizu,and B.K.Sriperumbudur.Optimal kernel choice for largescale two-sample tests.Advances in neural information processing systems,25,2012)。在时间分布特征化之后执行时间分布匹配。具体采用GRU网络学习不同分布的时间序列片段之间的共性。它还基于Boosting(提升)方法动态地学习不同时间点的重要性α。这可以更精确地减少时间相关性并获得预测模型的最优参数θ。该过程可以表示如下：Where dist indicates the distance between the distributions of the two segments, which can be calculated in several ways, such as cosine distance and MMD (see A. Gretton, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, K. Fukumizu, and B. K. Sriperumbudur. Optimal kernel choice for largescale two-sample tests. Advances in neural information processing systems, 25, 2012). Time distribution matching is performed after time distribution characterization. Specifically, the GRU network is used to learn the commonalities between time series segments with different distributions. It also dynamically learns the importance α of different time points based on the Boosting method. This can more accurately reduce the time correlation and obtain the optimal parameters θ of the prediction model. The process can be expressed as follows:

其中是预测的损失函数，而是任意两个片段的分布之间的距离。in is the loss function of prediction, and is the distance between the distributions of any two fragments.

调度模块Scheduling Module

排班模块预先预测纳税人的需求，然后安排合适数量的税务人员。然而，它只能实现粗略的供需匹配，不能解决实时动态匹配。因此，需要调度模块来动态地将申请与适当的员工实时地匹配，以减轻这些问题。The scheduling module predicts taxpayers' needs in advance and then arranges the appropriate number of tax personnel. However, it can only achieve a rough supply-demand matching and cannot solve the real-time dynamic matching. Therefore, a scheduling module is needed to dynamically match applications with appropriate employees in real time to alleviate these problems.

这个问题被表征为资源调度问题，并且本公开目的在于在一些目标下协调资源提供者和需求者。在税务情况下，提供者是税务员工，并且资源是他们的时间和专业知识；需求者是纳税人及其申请。税务管理者的目标是1)最小化员工的工作时间之间的差异以避免过度工作，2)通过向员工分配他们擅长的申请来最小化总处理时间，以及3)最小化纳税人的总等待时间以增加客户满意度。This problem is characterized as a resource scheduling problem, and the present disclosure aims to coordinate resource providers and demanders under some objectives. In the tax case, the providers are tax employees, and the resources are their time and expertise; the demanders are taxpayers and their applications. The goals of tax managers are 1) to minimize the difference between employees' working time to avoid overwork, 2) to minimize the total processing time by assigning employees to applications they are good at, and 3) to minimize the total waiting time of taxpayers to increase customer satisfaction.

由于存在多个目标，本申请将该调度问题建模为多目标优化问题。由于管理者希望具有多个选择而不是单个优化结果，因此本公开选择能够同时返回一组解的遗传算法。如图1的(D)所示，算法的输入包括实时信息，例如员工的可用性和具有税务类型标签的按照开始时间排序的申请序列。该输入还包括员工对从历史数据挖掘的不同税务类型的熟练程度。遗传算法集成了这三个目标，并返回多个解，提供了申请和员工之间的匹配。接下来，将简要介绍遗传算法和本公开的算法设计。Due to the presence of multiple objectives, the present application models the scheduling problem as a multi-objective optimization problem. Since managers want to have multiple choices rather than a single optimization result, the present disclosure selects a genetic algorithm that can simultaneously return a set of solutions. As shown in (D) of Figure 1, the input of the algorithm includes real-time information, such as employee availability and a sequence of applications sorted by start time with tax type labels. The input also includes employee proficiency in different tax types mined from historical data. The genetic algorithm integrates these three objectives and returns multiple solutions, providing a match between applications and employees. Next, the genetic algorithm and the algorithm design of the present disclosure will be briefly introduced.

遗传算法介绍Introduction to Genetic Algorithms

遗传算法受到自然选择和进化理论的启发。在它们的术语中，优化问题的解被称为个体或染色体。染色体由称为基因的离散单元组成，并且染色体的集合构成群体。Genetic algorithms are inspired by natural selection and evolution theory. In their terminology, the solution to the optimization problem is called an individual or chromosome. A chromosome consists of discrete units called genes, and a collection of chromosomes constitutes a population.

当求解单个目标优化问题时，遗传算法从初始群体开始，并且用三个算子(包括交叉、突变和选择)进化群体。在交叉算子中，遗传算法依赖于适应度函数(即，反映存活的可能性)来选择表现出良好适应度的两个父代染色体，并将它们组合以形成后代染色体。突变算子通过以低概率改变一些基因来对染色体引入随机改变。将通过交叉和突变产生的染色体加入到群体中。然后，在选择算子中，遗传算法将基于适应度值选择染色体以形成新一代的群体。当求解多目标优化问题时，遗传算法将涉及多个适应度函数，每个适应度函数用于目标。遗传算法将返回一组非支配解。该组非支配解中的解在目标子集合方面劣于其他非支配解，但在所有目标方面优于受支配解(即，搜索空间中的其余解)。When solving a single objective optimization problem, the genetic algorithm starts from an initial population and evolves the population with three operators (including crossover, mutation and selection). In the crossover operator, the genetic algorithm relies on a fitness function (i.e., reflecting the possibility of survival) to select two parent chromosomes that show good fitness, and combines them to form offspring chromosomes. The mutation operator introduces random changes to the chromosome by changing some genes with a low probability. The chromosomes produced by crossover and mutation are added to the population. Then, in the selection operator, the genetic algorithm will select chromosomes to form a new generation of population based on fitness values. When solving a multi-objective optimization problem, the genetic algorithm will involve multiple fitness functions, each of which is used for a target. The genetic algorithm will return a group of non-dominated solutions. The solution in this group of non-dominated solutions is inferior to other non-dominated solutions in terms of the target subset, but is superior to the dominated solution (i.e., the remaining solutions in the search space) in terms of all targets.

获得非支配解的直观但耗时的方式是关于所有目标比较群体中的每对染色体。为了加速，Deb等提出了一种称为非支配排序遗传算法II(NSGA-II)的遗传算法的变体，其利用非支配排序确定不同级别非支配正面的染色体，并计算拥挤距离以保持多样性。本公开提出了在税务情况下基于NSGA-II的资源调度算法，因为它在许多领域中被证明是非常有效的。An intuitive but time-consuming way to obtain a non-dominated solution is to compare each pair of chromosomes in all target populations. To speed up, Deb et al. proposed a variant of a genetic algorithm called non-dominated sorting genetic algorithm II (NSGA-II), which uses non-dominated sorting to determine chromosomes with different levels of non-dominated fronts and calculates crowding distances to maintain diversity. The present disclosure proposes a resource scheduling algorithm based on NSGA-II in a tax situation because it has been proven to be very effective in many fields.

NSGA-II的资源调度算法Resource Scheduling Algorithm of NSGA-II

算法流程图(图3)示出了设计NSGA-II算法来解决资源调度问题，以及在三个目标下将纳税人的申请与合适的员工匹配。基于上述对通用遗传算法和NSGA-II的介绍，将从以下方面详细描述了本申请的算法设计。The algorithm flow chart (Figure 3) shows the design of the NSGA-II algorithm to solve the resource scheduling problem and match the taxpayer's application with the appropriate employee under the three objectives. Based on the above introduction to the general genetic algorithm and NSGA-II, the algorithm design of this application will be described in detail from the following aspects.

基因和染色体编码(图3的(A))。第一步是建立解空间和染色体之间的映射机制，使得染色体对应于唯一解。在税务情况下，给定根据提交时间的申请的序列，预期的解是将相应地处理申请的员工的序列。建立图3的(A)所示的解和染色体之间的映射。假设有M个纳税人和N个员工。设计长度为M的一维染色体，其中每个基因代表一个员工，其索引代表申请的序号。例如，第三基因的值是2，这意味着第三申请由第二税务员工处理。Gene and chromosome encoding (Figure 3 (A)). The first step is to establish a mapping mechanism between the solution space and the chromosome so that the chromosome corresponds to a unique solution. In the tax case, given a sequence of applications according to the submission time, the expected solution is the sequence of employees who will handle the applications accordingly. Establish a mapping between the solution and the chromosome shown in Figure 3 (A). Assume that there are M taxpayers and N employees. Design a one-dimensional chromosome of length M, where each gene represents an employee and its index represents the serial number of the application. For example, the value of the third gene is 2, which means that the third application is handled by the second tax employee.

群体初始化(图3的(B))。给定群体大小，将随机产生初始群体。Population initialization (Fig. 3(B)): Given a population size, an initial population will be randomly generated.

交叉算子(图3的(C))。使用单点交叉，其中随机挑选一个点，在该点切割两个父代染色体，并在两个父代染色体之间交换该点右侧的基因。Crossover operator ((C) of FIG3 ) A single-point crossover is used, in which a point is randomly picked, the two parent chromosomes are cut at this point, and the genes to the right of this point are exchanged between the two parent chromosomes.

变异算子(图3的(D))。给定突变率(即，改变基因属性的概率)，在合格范围内改变基因属性。Mutation operator (Fig. 3(D)): Given a mutation rate (i.e., the probability of changing a gene attribute), the gene attribute is changed within a qualified range.

适应度函数和计算(图3的(E))。税务管理者给出了三个目标：使员工的工作时间之间的差异最小化，使员工的总处理时间最小化，以及使纳税人的总等待时间最小化。对于每个目标，本公开具有适应度函数。通过计算每个员工的工作时间的标准偏差来转移工作时间之间的差。当知道以下信息时，可以直观地获得三个适应度函数：将处理M个申请的员工的序列(例如染色体)、M个申请的提交时间的序列、M个申请的相应税务类型的序列、以及M个申请的处理时间的序列。Fitness function and calculation (FIG. 3 (E)). The tax manager gives three goals: minimize the difference between employees' working hours, minimize the total processing time of employees, and minimize the total waiting time of taxpayers. For each goal, the present disclosure has a fitness function. The difference between working hours is transferred by calculating the standard deviation of each employee's working time. When the following information is known, the three fitness functions can be intuitively obtained: the sequence of employees who will process M applications (e.g., chromosomes), the sequence of submission times of the M applications, the sequence of corresponding tax types of the M applications, and the sequence of processing times of the M applications.

非支配排序(图3的(F))。多目标优化问题由于多个目标而具有一组非支配解。NSGA-II算法可以以非支配前部的不同级别对染色体进行排序。以两个适应度值(即，总处理时间和总等待时间)为例，图3的(F)示出了染色体的两个适应度值和非支配前部的三个级别，基于此可以对染色体进行排序。两个适应度值越小，染色体就越合适，并且它们的相应解就越好。直接采用NSGA-II中的非支配排序来寻找这些非支配前部并排序染色体。Non-dominated sorting (FIG. 3 (F)). Multi-objective optimization problems have a set of non-dominated solutions due to multiple objectives. The NSGA-II algorithm can sort chromosomes at different levels of non-dominated fronts. Taking two fitness values (i.e., total processing time and total waiting time) as an example, FIG. 3 (F) shows two fitness values of chromosomes and three levels of non-dominated fronts, based on which chromosomes can be sorted. The smaller the two fitness values, the more suitable the chromosomes are, and the better their corresponding solutions are. The non-dominated sorting in NSGA-II is directly used to find these non-dominated fronts and sort the chromosomes.

拥挤距离(图3的(G))。非支配排序可以基于非支配前部的等级对染色体进行排序。为了进一步排序相同非支配前部中的染色体，需要计算每个染色体的拥挤距离，其反映该染色体如何不同于相同等级的其他染色体。也可以直接采用NSGA-II中的拥挤距离计算方法。Crowding distance (Figure 3 (G)). Non-dominated sorting can sort chromosomes based on the rank of the non-dominated front. In order to further sort the chromosomes in the same non-dominated front, it is necessary to calculate the crowding distance of each chromosome, which reflects how the chromosome is different from other chromosomes of the same rank. The crowding distance calculation method in NSGA-II can also be directly used.

选择算子和新群体生成(图3的(H)和图3的(I))。选择的目的是选择更适合的和更多样化的染色体，将它们添加到群体的下一代，并且将它们作为下一轮进化的起点。基于非支配排序和拥挤距离选择染色体。具体地，具有较高非支配排名的染色体将被优先添加到新一代，直到群体达到预定义的群体大小。在非支配前部的最后一级，选择算子可能需要从该级的所有染色体中选择染色体。因此，将选择具有较大拥挤距离的染色体以保持多样性并避免落入局部解。Selection operator and new colony generation (Fig. 3 (H) and Fig. 3 (I)). The purpose of selection is to select more suitable and more diverse chromosomes, add them to the next generation of the colony, and use them as the starting point for the next round of evolution. Chromosomes are selected based on non-dominated sorting and crowding distance. Specifically, chromosomes with higher non-dominated rankings will be preferentially added to the new generation until the colony reaches a predefined colony size. At the last level of the non-dominated front, the selection operator may need to select chromosomes from all chromosomes at this level. Therefore, chromosomes with larger crowding distances will be selected to maintain diversity and avoid falling into local solutions.

最终解(图3的(J)和图3的(K))。在得到新一代的群体之后，算法将检查终止条件(例如，迭代次数以及目标是否被优化到特定阈值)。如果不是，算法将开始另一迭代。否则，将以与选择算子相同的方式或基于税务管理者的偏好(例如，以三个目标的特定顺序排列染色体)来排列和返回当前染色体。Final solution (Fig. 3(J) and Fig. 3(K)). After obtaining the population of the new generation, the algorithm will check the termination condition (e.g., the number of iterations and whether the objective is optimized to a certain threshold). If not, the algorithm will start another iteration. Otherwise, the current chromosome will be arranged and returned in the same way as the selection operator or based on the tax manager's preference (e.g., arranging chromosomes in a specific order of the three objectives).

用户界面user interface

考虑到税务管理人员没有技术背景，对该过程自动化，用于尽可能地使用这些算法。然而，这些算法仍然需要一定程度的用户参与，包括调整参数、比较结果和对多个调度解进行排序。另外，由于预测算法是不完美的并且很大程度上依赖于数据集，需要涉及人类并要求他们选择最合适的算法。为了进一步降低利用算法的障碍，已经开发了一种网络申请，其包括后端和前端。后端主要集成算法，并且前端是利用数据可视化来示出预测和调度结果的用户界面(图4)。Considering that tax managers do not have a technical background, the process is automated to use these algorithms as much as possible. However, these algorithms still require a certain degree of user involvement, including adjusting parameters, comparing results, and sorting multiple scheduling solutions. In addition, since the prediction algorithms are imperfect and largely dependent on the data set, humans need to be involved and asked to select the most appropriate algorithm. In order to further reduce the barriers to utilizing the algorithms, a web application has been developed that includes a backend and a frontend. The backend mainly integrates the algorithms, and the frontend is a user interface that uses data visualization to show the prediction and scheduling results (Figure 4).

用于排班模块的界面(图4)由六个链接的面板组成，其分别允许用户上传历史数据集(A)、选择预测算法(B)、利用线图观察预测结果(C)、设置参数(D)以及获得排班结果(E和F)。具体地，六个面板包括如下的面板(A)至(F)。The interface for the scheduling module (Figure 4) consists of six linked panels, which respectively allow the user to upload historical data sets (A), select prediction algorithms (B), observe prediction results using line graphs (C), set parameters (D), and obtain scheduling results (E and F). Specifically, the six panels include the following panels (A) to (F).

在数据面板(A)中，税务管理者可以上传包含不同税务类型的申请的日期和相应计数的数据文件。在被上传之后，文件将被发送到后端以供稍后使用。In the data panel (A), the tax manager can upload a data file containing the dates and corresponding counts of applications for different tax types. After being uploaded, the file will be sent to the backend for later use.

算法选择及设置面板(B)允许税务管理者选择使用哪些算法来训练和预测，并设置必要的参数。目前，为用户提供AdaRNN和SARIMA。在设置参数之后，税务管理者可以点击“开始训练”按钮，并且后端将基于历史数据和参数来训练指定的机器学习模型。The algorithm selection and settings panel (B) allows tax managers to select which algorithms to use for training and prediction, and set the necessary parameters. Currently, AdaRNN and SARIMA are provided to users. After setting the parameters, tax managers can click the "Start Training" button, and the backend will train the specified machine learning model based on historical data and parameters.

在可视化面板(C)中，为了促进税务管理者具有非直觉的理解和分析，用线图画出了模型训练和预测结果。通过观察模型预测的数据与原始历史数据的差异，税务管理者可以定性地比较每种算法的优缺点。In the visualization panel (C), the model training and prediction results are plotted with line graphs to facilitate non-intuitive understanding and analysis by tax managers. By observing the difference between the model-predicted data and the original historical data, tax managers can qualitatively compare the advantages and disadvantages of each algorithm.

在排班设置面板(D)中，税务管理者需要设置一些专用于税务情况的参数，包括他们希望预测的天数和每个税务员工的默认工作分钟。同样，通过在可视化面板中示出的预测结果，税务管理者可以比较它们并且为每个税务类型选择最佳训练的模型。在点击“获得排班结果”按钮之后，后端将使用所选择的、训练的模型来预测每个税务类型的数据。In the Schedule Settings panel (D), the tax manager needs to set some parameters specific to the tax situation, including the number of days they want to predict and the default working minutes for each tax employee. Also, with the prediction results shown in the visualization panel, the tax manager can compare them and select the best trained model for each tax type. After clicking the "Get Schedule Results" button, the backend will use the selected, trained model to predict data for each tax type.

在排班结果面板(E)-(F)中，两个面板显示预测和排班结果。根据税务管理者的要求(R1)，显示税务类型、申请计数和在某个未来周期内要求的员工的数量。In the Scheduling Results Panel (E)-(F), two panels show the forecast and scheduling results. Based on the tax manager's request (R1), the tax type, application count, and the number of employees required in a certain future period are displayed.

本公开的系统已经用从当地税务机关获得的真实且脱敏的历史数据进行了测试。它还被税务机关的税务管理者使用。接下来，将介绍数据、实验设置和结果。The disclosed system has been tested with real and desensitized historical data obtained from local tax authorities. It is also used by tax managers of the tax authorities. Next, the data, experimental settings, and results are introduced.

原始数据由覆盖相对静态信息(例如税务类型和税务员工)和动态信息(例如历史申请)的几个表组成。本公开中使用的数据字段解释如下。The raw data consists of several tables covering relatively static information (such as tax type and tax employee) and dynamic information (such as historical applications). The data fields used in this disclosure are explained as follows.

税务类型。当地税务局中的税务业务涉及九种税务类型和许多子类，每种税务类型和子类需要不同的知识来处理。将九种税务类型称为第一级类型，将子类型称为第二级类型。该项目集中在第一级类型，并且用Type 1、Type 2、Type 3、...、Type 9来表示它们。Tax types. Tax business in the local tax bureau involves nine tax types and many sub-categories, each of which requires different knowledge to handle. The nine tax types are called first-level types, and the sub-types are called second-level types. This project focuses on the first-level types and represents them as Type 1, Type 2, Type 3, ..., Type 9.

税务员工。由当地税务机关给出的数据与五个税务员工相关。每个税务员工具有唯一的标识号，并且用他们的ID来表示它们。Tax employees. The data given by the local tax authorities is related to five tax employees. Each tax employee has a unique identification number and they are represented by their ID.

申请事件。如上所述，纳税人可以以各种方式提交税务申请表格，并且员工将处理这些申请。将申请从被提交到被处理的过程作为申请事件。本申请描述了具有五元组{申请ID、税务员工ID、税务类型、开始时间戳、结束时间戳}的事件。开始时间戳和结束时间戳包含日期和时间信息。每个事件被记录在税务机关的数据库中，形成时间序列。Application event. As mentioned above, taxpayers can submit tax application forms in various ways, and employees will process these applications. The process from application submission to application processing is regarded as an application event. This application describes an event with a five-tuple {application ID, tax employee ID, tax type, start timestamp, end timestamp}. The start timestamp and end timestamp contain date and time information. Each event is recorded in the database of the tax authority to form a time series.

进行数据分析以获得对税务情况的更好理解。首先通过去除包含噪声、异常值或缺失值的事件来预处理原始申请事件数据。从样本数据集中获得386天内总共9420个历史申请事件。在实践中，日常申请计数的时间序列数据是不稳定的、申请分布是不均匀的、员工的工作量是不均匀的，并且员工具有不同的熟练度。Data analysis is performed to gain a better understanding of the tax situation. The raw application event data is first preprocessed by removing events containing noise, outliers, or missing values. A total of 9420 historical application events over 386 days are obtained from the sample dataset. In practice, the time series data of daily application counts is unstable, the application distribution is uneven, the workload of employees is uneven, and employees have different proficiency levels.

预测算法评估Prediction Algorithm Evaluation

当在历史数据上训练AdaRNN模型时，首先进行单步预测，其中从每24天取得数据来预测下一天的数据。由于期望预测未来多天的数据，因此模型需要是多输出模型。为了预测多个时间步长，使用多步递归预测。利用AdaRNN单步预测，在每一步中预测的值被用作下一步的输入以进行迭代预测。这样做的好处是它更快，并且只需要训练一个模型来输出多个结果。When training the AdaRNN model on historical data, a single-step forecast is first performed, where data is taken from every 24 days to predict the next day's data. Since it is expected to predict data for multiple days in the future, the model needs to be a multi-output model. In order to predict multiple time steps, multi-step recursive forecasting is used. With AdaRNN single-step forecasting, the value predicted in each step is used as the input of the next step for iterative forecasting. The advantage of this is that it is faster and only one model needs to be trained to output multiple results.

为了评估训练的AdaRNN模型，用三个基准进行数值比较。第一基准是季节性自回归积分移动平均(SARIMA)，其是具有季节性的时间序列数据的经典时间序列预测算法ARIMA的扩展。由于SARIMA有七个参数要确定，因此当对历史税务数据使用SARIMA时，使用自动参数选择方法。本公开中，没有直接给出参数的具体值。然而，为每个参数给出了一个范围，得到参数的叉乘法的群组，基于群组中的每组参数训练模型，并选择具有最佳结果的模型作为最终训练的SARIMA模型。第二基准是梯度增强回归树(GBRT)，为用于时间序列预测的新开发的机器学习方法。由于XGboost是具有高准确度和快速训练速度的GBRT的有效实现，因此将其适配到基于窗口的回归模型中以提高其在实施方案中的效率。第三基准是LSTNet，这是一种常用的基准，其使用CNN和RNN来从时间序列数据捕获长期和短期特征。根据数据集，调整了模型参数，例如学习率和批量大小。In order to evaluate the trained AdaRNN model, three benchmarks are used for numerical comparison. The first benchmark is seasonal autoregressive integrated moving average (SARIMA), which is an extension of the classic time series prediction algorithm ARIMA with seasonal time series data. Since SARIMA has seven parameters to be determined, an automatic parameter selection method is used when SARIMA is used for historical tax data. In this disclosure, the specific values of the parameters are not directly given. However, a range is given for each parameter, a group of cross multiplication of the parameters is obtained, a model is trained based on each group of parameters in the group, and the model with the best result is selected as the final trained SARIMA model. The second benchmark is the gradient boosted regression tree (GBRT), which is a newly developed machine learning method for time series prediction. Since XGboost is an effective implementation of GBRT with high accuracy and fast training speed, it is adapted to a window-based regression model to improve its efficiency in the implementation scheme. The third benchmark is LSTNet, which is a commonly used benchmark that uses CNN and RNN to capture long-term and short-term features from time series data. According to the data set, model parameters such as learning rate and batch size are adjusted.

在实施例中，选择了两种常用的度量来评估这些时间序列预测模型对历史数据的性能。第一个是平均绝对误差(MAE)，其可以反映预测值误差的实际情况，并且如下计算：In the embodiment, two commonly used metrics are selected to evaluate the performance of these time series prediction models on historical data. The first is the mean absolute error (MAE), which can reflect the actual situation of the prediction value error and is calculated as follows:

第二度量是均方根误差(RMSE)，其可以测量观察值和真实值之间的偏差，并且如下计算：The second metric is the root mean square error (RMSE), which measures the deviation between the observed value and the true value and is calculated as follows:

表1示出了针对不同主要税务类型，四个模型在历史数据集上的度量值。结果表明，AdaRNN一般具有较好的性能。Table 1 shows the metrics of the four models on the historical dataset for different major tax types. The results show that AdaRNN generally has better performance.

表1.对包含不同税务类型的数据集在AdaRNN与三种基准算法之间的比较。Table 1. Comparison between AdaRNN and three baseline algorithms on datasets containing different tax types.

调度算法评估Scheduling Algorithm Evaluation

随机选择数据集中的一天作为示例，以评估所设计的调度算法的有效性。在那天，四个税务员工处理70个申请。四个员工的工作时间是190、102、49和239分钟。它们的总工作时间为约580分钟，并且它们的工作时间的标准偏差为85.5。纳税人的总等待时间没有被记录。使用基于申请序列的设计调度算法并获得多个解。表2将由该算法给出的十个调度解(Solution 1至Solution 10)与在选定日期的实际数据进行比较。可以看出，所有十个调度解允许更平衡的工作负荷，其中总工作分钟有一些增加。还可以观察到，这三个目标在许多情况下彼此冲突，并且不能同时达到最佳。关于此，需要人为干预来决定使用哪种调度解。例如，如果一天非常繁忙，则税务管理者可能牺牲平衡工作负荷的目标，并且向更熟练的员工分配更多的申请以追求效率和更少的纳税人等待时间。如果在给定的一天有较少的纳税人，税务管理者可能更关心平衡各个员工的工作时间以避免导致过度工作的不当行为。One day in the data set is randomly selected as an example to evaluate the effectiveness of the designed scheduling algorithm. On that day, four tax employees processed 70 applications. The working hours of the four employees were 190, 102, 49 and 239 minutes. Their total working time was about 580 minutes, and the standard deviation of their working time was 85.5. The total waiting time of the taxpayer was not recorded. The design scheduling algorithm based on the application sequence was used and multiple solutions were obtained. Table 2 compares the ten scheduling solutions (Solution 1 to Solution 10) given by the algorithm with the actual data on the selected date. It can be seen that all ten scheduling solutions allow a more balanced workload, in which the total working minutes have some increase. It can also be observed that these three goals conflict with each other in many cases and cannot be optimal at the same time. In this regard, human intervention is needed to decide which scheduling solution to use. For example, if a day is very busy, the tax manager may sacrifice the goal of balancing the workload and assign more applications to more skilled employees in pursuit of efficiency and less taxpayer waiting time. If there are fewer taxpayers on a given day, tax managers may be more concerned with balancing the work schedules of individual employees to avoid improper behavior that leads to overwork.

表2.在选定的一天，十个潜在的调度解和实际记录之间的比较Table 2. Comparison between ten potential scheduling solutions and actual records on a selected day

综上，本公开提供一种专门的系统，用于税务管理者排班和调度员工。该系统可以基于不稳定的历史数据提前预测不同税务类型的申请计数以对员工排班，可以实时地匹配申请与合适的税务员工以调度员工，并且提供直观的用户界面和交互以允许管理者获得结果。In summary, the present disclosure provides a dedicated system for tax managers to schedule and dispatch employees. The system can predict the application counts of different tax types in advance based on unstable historical data to schedule employees, can match applications with appropriate tax employees in real time to dispatch employees, and provide an intuitive user interface and interaction to allow managers to obtain results.

本公开改进的遗传算法以满足税务情况下的特定特征，以在多个目标下使申请与合适的税务员工匹配，并且为税务管理者提供若干备选方案。The present disclosure improves the genetic algorithm to meet the specific characteristics in the tax case to match the application with the appropriate tax staff under multiple objectives and provide several alternatives for the tax manager.

本公开提供了具有数据可视化的用户界面，以允许非技术性税务管理者容易地与高级算法交互。The present disclosure provides a user interface with data visualization to allow non-technical tax managers to easily interact with advanced algorithms.

根据另一方面，公开了一种非暂时性计算机可读存储介质，其上存储有计算机程序，当被执行时，所述计算机程序由处理器执行上述排班模块、调度模块的功能。According to another aspect, a non-transitory computer-readable storage medium is disclosed, on which a computer program is stored. When executed, the computer program causes a processor to perform the functions of the above-mentioned scheduling module and dispatching module.

计算机可读存储介质例如可以是盘或存储器等。计算机程序可以以对计算机可读存储介质进行编码的指令的形式存储在计算机可读存储介质中。The computer-readable storage medium may be, for example, a disk or a memory, etc. The computer program may be stored in the computer-readable storage medium in the form of instructions encoding the computer-readable storage medium.

上述内容是示例实施例的说明，不应解释为对其的限制。尽管已经描述了一些示例实施例，但是本领域技术人员将容易理解，在不实质上偏离示例实施例的新颖教导和优点的情况下，在示例实施例中可以进行许多修改。因此，所有这样的修改都旨在包括在权利要求中定义的示例实施例的范围内。因此，应当理解，上述内容是各种示例实施例的说明，不应被解释为限于所公开的特定示例实施例，并且对所公开的示例实施例以及一些示例实施例的修改旨在包括在所附权利要求的范围内。The foregoing is an illustration of example embodiments and should not be construed as limiting thereof. Although some example embodiments have been described, it will be readily appreciated by those skilled in the art that many modifications may be made in the example embodiments without substantially departing from the novel teachings and advantages of the example embodiments. Therefore, all such modifications are intended to be included within the scope of the example embodiments defined in the claims. Therefore, it should be understood that the foregoing is an illustration of various example embodiments and should not be construed as being limited to the specific example embodiments disclosed, and modifications to the disclosed example embodiments and some example embodiments are intended to be included within the scope of the appended claims.

Claims

1. An interactive system for scheduling and dispatching tax personnel, comprising:

A scheduling module, which is configured to obtain historical data, and predict future application counts using a time series prediction algorithm based on the historical data to output a scheduling result;

A scheduling module configured to receive input information and use a resource scheduling algorithm based on NSGA-II to match employees in the scheduling result with applications to be processed in real time to generate a scheduling result; and

A user interface displays the scheduling result and the dispatching result.

2. The interactive system according to claim 1, wherein the time series prediction algorithm comprises one of a SARIMA algorithm and an AdaRNN deep learning algorithm.

3. The interactive system according to claim 1, wherein the time series prediction algorithm comprises an AdaRNN deep learning algorithm.

4. The interactive system according to claim 2, wherein:

The historical data includes: tax type, average processing time for each employee to process applications of the tax type, and historical daily application counts.

5. The interactive system according to claim 2, wherein:

The historical daily application count is a non-stationary time series that arranges applications in chronological order.

6. The interactive system according to claim 2, wherein:

The input information includes real-time employee availability, chronologically ordered applications currently pending, and each employee's proficiency with the tax type.

7. The interactive system according to claim 2, wherein the scheduling result comprises a plurality of scheduling results that respectively achieve at least one of the following objectives:

Minimize differences between employees’ working hours;

Minimize overall processing time; and

Minimize the total waiting time for taxpayers.

8. The interactive system according to claim 2, wherein, in the NSGA-II-based resource scheduling algorithm, the following processing is performed: encoding, population initialization, crossover, mutation, fitness calculation, non-dominated sorting, crowding distance calculation, selection, generating new populations, checking termination conditions, and returning multiple solutions.

9. The interactive system according to claim 2, wherein the user interface comprises:

A data panel that allows users to upload the historical data;

Algorithm selection and setting panel, which allows users to select one of the SARIMA algorithm and AdaRNN deep learning algorithm and set the corresponding parameters;

A visualization panel that displays historical daily application counts and predicted future application counts in the form of a line graph;

The shift setting panel allows users to set parameters related to shift scheduling;

The shift scheduling result panel displays the shift scheduling result in the form of a line graph.

10. The interactive system according to claim 2, wherein the parameters related to scheduling include default working minutes and expected forecast days for each employee.

11. A non-transitory computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the processor performs the following operations:

Acquire historical data, and use a time series prediction algorithm based on the historical data to predict future application counts to output a scheduling result;

Receiving input information, and using a resource scheduling algorithm based on NSGA-II to match employees in the scheduling result with applications to be processed in real time to generate a scheduling result; and

The scheduling result and the dispatching result are displayed to the user.