CN107798111A

CN107798111A - A kind of method that data are in high volume exported in distributed environment

Info

Publication number: CN107798111A
Application number: CN201711059530.3A
Authority: CN
Inventors: 李波; 岳永胜
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2017-11-01
Filing date: 2017-11-01
Publication date: 2018-03-13
Anticipated expiration: 2037-11-01
Also published as: CN107798111B

Abstract

The invention discloses a kind of method that data are in high volume exported in distributed environment, it is improved for the deficiency of tradition export data, using first using the inquiry of multithreading list table, the data inquired are used into multithreading carry out table inquiry again again, accelerate data export speed so as to reach, shorten the effect of period of reservation of number.The present invention reduces the load of database server and application server using the technology that multithreading is used for multiple times while shortening period of reservation of number.

Description

A kind of method that data are in high volume exported in distributed environment

Technical field

The present invention relates to technical field of data processing, the side of data is in high volume exported in more particularly to a kind of distributed environment Method.

Background technology

With the continuous upgrading of software, demand, which gradually increases traditional design method, can not meet business demand.Cause This must use distributed development mode, and business is split.Database layer design uses more storehouses, and multilist is designed.Cloud Platform operation personnel or enterprise administrator need to carry out export analysis to operation data at any time.Although software developer can basis The demand of data analyst develops the function of meeting to require.But propose among use of reaching the standard grade that demand can be undergone from demand A series of cumbersome processes such as analysis, Outline Design, written in code, test, modification bug, issue.However, market is often wind Cloud change, chance is also written in water over time, and therefore, analyzing operation data in time turns into every web site operator Required course.

In routine work, network operator more gets used to exporting to Excel forms to the various visualized operations of data progress.Therefore Export high-volume data, which have become developer, needs problems faced.Conventional architectures use single database, single server mode.Directly Connect inquiry and export to Excel cans.But a large amount of numbers can not exported using traditional mode in distributed environment According to.The main problems faced of distributed environment is that database is designed by the way of more storehouses.Traditional mode can be to database Carry out conjunctive query and conjunctive query can not be carried out behind point storehouse.

Exported as seen from the above in distributed environment by the way of traditional, user must face long-time etc. Treat, system is also by the inquiry problem for facing M*N (M is the number of data of inquiry, and N is the table being related to).Such as in export 10,000 Data, it is related to 3 tables.Will be 30,000 times to data base querying.In the case where concurrency is higher, database will necessarily Collapse.Therefore, the mode of traditional derived table is unfavorable for lifting data export speed and reduces the data base querying time.

The content of the invention

The purpose of the present invention is to overcome deficiency in above-mentioned background technology, there is provided in high volume exports number in a kind of distributed environment According to method, using first using multithreading list table to inquire about, then the data inquired are used into multithreading carry out table inquiry again, from And effectively accelerate data export speed, shorten period of reservation of number.

In order to reach above-mentioned technique effect, the present invention takes following technical scheme：

A kind of method that data are in high volume exported in distributed environment, is comprised the steps of：

A.web front ends send the condition of data derived from needs, go out to meet number derived from user's needs by aggregate query According to bar number, calculate each thread and complete how many data inquiries；

B. create one and realize java.util.concurrent.Callable interface threads, the interface thread is used for Paging obtains the data of order table；

C. thread pool is created according to the configuring condition of server, query task is added in thread pool, and be each institute State thread pool and preset optimal number of threads, according to the scheduling strategy of thread pool, call query task automatically, carry out the number of order table It is investigated that asking, and the data for inquiring about the order table obtained are preserved into the first file；

D. data are carried out with conjunctive query, foundation and the one-to-one query task of order table, and single table is inquired Data carry out paging, and the data inquired are stored in internal memory in the form of Map, obtain key values and value Value, wherein, the key values are the id values of order data, and value values are the data value that contingency table needs to get；

E. the data of Map form of the data in first file with being obtained in step D are assembled, is combined into symbol The List set that export requires is closed, and List set is stored in the second file；

F. the data in the second file are parsed, obtains data in the form of IO streams, and write data into In Excel file；

G. an Excel bar number higher limit is preset to Excel file, is write data into step F in Excel file When, if the Excel bars of write-in are counted to up to the Excel bars number higher limit, by the number of the form for the Map being stored in internal memory According to being written in hard disk；

H. http agreements are used, file data is write into the response output stream of the http request of web front-end, completes large quantities of Measure data export.

Further, when the data of order table are obtained in the step B, the packaged inquiry specially in Service layers The method of data, jpa is called, inquired about using sql sentences, by passing paging parameter and inquiry bar in controller layers Part is to Service layers to obtain the data of paging query.

Further, it is that the thread pool uses following calculation when presetting optimal number of threads in the step C：Most Good number of threads=(thread latency/thread CPU time+1) * CPU numbers.

Further, the scheduling strategy of thread pool is described in the step C：Operation is added to the line in thread pool first Journey, when number of threads exceedes default number of threads, unnecessary thread is waited in line to run in waiting list.

Further, in the step E export requirement be：Sequence information, wound are comprised at least in data derived from it is required that Build people's information, company information.

Further, when writing data into the step G in Excel file, mainly use POI's SXSSFWorkbook。

The present invention compared with prior art, has following beneficial effect：

In in high volume exporting the method for data in a kind of distributed environment of the present invention, it handles logic to be used for multiple times Multithreading, io stand-by period is reduced to most short；Wherein, inquired about using multithreading list table, then the data inquired use is more Thread carries out table inquiry again, can effectively accelerate data export speed, shorten period of reservation of number, while also help reduction number According to the load of storehouse server and application server.

Brief description of the drawings

Fig. 1 is the processing stream of the one embodiment for the method that data are in high volume exported in a kind of distributed environment of the present invention Journey schematic diagram.

Embodiment

With reference to embodiments of the invention, the invention will be further elaborated.

Embodiment：

As shown in figure 1, in high volume exporting the method for data in a kind of distributed environment, specifically below scheme step is included：

S101：Web front-end sends the condition of data derived from needs；

S102：Data acquisition request with export condition is dealt into controller corresponding to back office interface by web front-end In method, total data bar number derived from needing is gone out further according to export condition query；

S103：Being calculated according to the total number of inquiry, which needs to be divided into how many individual threads, carries out single table export, during calculating, Thread Count =total number/total number, total number that each thread can be completed derived from each thread can be looked into by program test maximum load How many are ask, the multiple threads i.e. query task created is put into thread pool, carries out the data query of order table.

S104：The big data quantity for inquiring about the order table got is deposited into file system, and it is more to calculate needs again Few thread completes data assembling.

Such as in demand derived from an order, wish to look inside Excel to sequence information, founder's information, it is public Information is taken charge of, these three information are stored in three tables of three databases respectively, inquire order numbers first after order data According to being stored in file system A, then take out order ID and be deposited into List, because also two tables need to inquire about, therefore also Need to create two thread pools, go to obtain founder's information and company information using the mode for obtaining order, after getting respectively The founder's information and company information got using two Map storages.

Wherein, a thread completes a lot data assembling, the number of log-on data association while starting thread together According to thread corresponding to table.Each association thread is then assembled into form desired by Excel, example using In inquiry batch query data As there is sequence information in Excel, founder's information, company information, then by existing three parts data assembling in a row.

S105：The data in file are read using the form of stream, the data read are written in Excel forms, One storage bar number threshold values is set in Excel forms, hard disk is write data into automatically when reaching and storing bar number threshold values In, so just completion batch data is exported in Excel for circulation, reuses http agreements toward the sound of the http request of web front-end It should export and file data is write in stream, complete the export of high-volume data.

It is understood that the principle that embodiment of above is intended to be merely illustrative of the present and the exemplary implementation that uses Mode, but the invention is not limited in this.For those skilled in the art, the essence of the present invention is not being departed from In the case of refreshing and essence, various changes and modifications can be made therein, and these variations and modifications are also considered as protection scope of the present invention.

Claims

1. the method for data is in high volume exported in a kind of distributed environment, it is characterised in that comprise the steps of：

A.web front ends send the condition of data derived from needs, go out to meet data strip derived from user's needs by aggregate query Number, calculate each thread and complete how many data inquiries；

B. create one and realize java.util.concurrent.Callable interface threads, the interface thread is used for paging Obtain the data of order table；

C. thread pool is created according to the configuring condition of server, is that query task is added in thread pool by thread, and be each The thread pool presets optimal number of threads, according to the scheduling strategy of thread pool, calls query task automatically, carries out order table Data query, and the data for inquiring about the order table obtained are preserved into the first file；

D. conjunctive query is carried out to data, establish with the one-to-one query task of order table, and to data that single table inquires Paging is carried out, and the data inquired are stored in internal memory in the form of Map, obtains key values and value values, its In, the key values are the id values of order data, and value values are the data value that contingency table needs to get；

E. the data of Map form of the data in first file with being obtained in step D are assembled, is combined into meet and leads Go out the List set of requirement, and List set is stored in the second file；

F. the data in the second file are parsed, data is obtained in the form of IO streams, and write data into Excel texts In part；

G. an Excel bar number higher limit is preset to Excel file, when being write data into step F in Excel file, If the Excel bars of write-in are counted to up to the Excel bars number higher limit, by the data of the form for the Map being stored in internal memory It is written in hard disk；

H. http agreements are used, file data is write into the response output stream of the http request of web front-end, completes high-volume number According to export.

2. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described When the data of order table are obtained in step B, the method for packaged inquiry data, calling jpa, make specially in Service layers Inquired about with sql sentences, by passing paging parameter and querying condition to Service layers to obtain point in controller layers The data of page inquiry.

3. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described It is that the thread pool uses following calculation when presetting optimal number of threads in step C：Optimal number of threads=(thread waits Time/thread CPU time+1) * CPU numbers.

4. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described The scheduling strategy of thread pool is described in step C：Operation is added to the thread in thread pool first, is preset when number of threads exceedes Number of threads when, unnecessary thread is waited in line to run in waiting list.

5. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described Export in step E requires：Sequence information, founder's information, company information are comprised at least in data derived from it is required that.

6. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described When writing data into step G in Excel file, POI SXSSFWorkbook is mainly used.