CN107798111A - A kind of method that data are in high volume exported in distributed environment - Google Patents
A kind of method that data are in high volume exported in distributed environment Download PDFInfo
- Publication number
- CN107798111A CN107798111A CN201711059530.3A CN201711059530A CN107798111A CN 107798111 A CN107798111 A CN 107798111A CN 201711059530 A CN201711059530 A CN 201711059530A CN 107798111 A CN107798111 A CN 107798111A
- Authority
- CN
- China
- Prior art keywords
- data
- thread
- file
- excel
- distributed environment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5011—Pool
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5018—Thread allocation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method that data are in high volume exported in distributed environment, it is improved for the deficiency of tradition export data, using first using the inquiry of multithreading list table, the data inquired are used into multithreading carry out table inquiry again again, accelerate data export speed so as to reach, shorten the effect of period of reservation of number.The present invention reduces the load of database server and application server using the technology that multithreading is used for multiple times while shortening period of reservation of number.
Description
Technical field
The present invention relates to technical field of data processing, the side of data is in high volume exported in more particularly to a kind of distributed environment
Method.
Background technology
With the continuous upgrading of software, demand, which gradually increases traditional design method, can not meet business demand.Cause
This must use distributed development mode, and business is split.Database layer design uses more storehouses, and multilist is designed.Cloud
Platform operation personnel or enterprise administrator need to carry out export analysis to operation data at any time.Although software developer can basis
The demand of data analyst develops the function of meeting to require.But propose among use of reaching the standard grade that demand can be undergone from demand
A series of cumbersome processes such as analysis, Outline Design, written in code, test, modification bug, issue.However, market is often wind
Cloud change, chance is also written in water over time, and therefore, analyzing operation data in time turns into every web site operator
Required course.
In routine work, network operator more gets used to exporting to Excel forms to the various visualized operations of data progress.Therefore
Export high-volume data, which have become developer, needs problems faced.Conventional architectures use single database, single server mode.Directly
Connect inquiry and export to Excel cans.But a large amount of numbers can not exported using traditional mode in distributed environment
According to.The main problems faced of distributed environment is that database is designed by the way of more storehouses.Traditional mode can be to database
Carry out conjunctive query and conjunctive query can not be carried out behind point storehouse.
Exported as seen from the above in distributed environment by the way of traditional, user must face long-time etc.
Treat, system is also by the inquiry problem for facing M*N (M is the number of data of inquiry, and N is the table being related to).Such as in export 10,000
Data, it is related to 3 tables.Will be 30,000 times to data base querying.In the case where concurrency is higher, database will necessarily
Collapse.Therefore, the mode of traditional derived table is unfavorable for lifting data export speed and reduces the data base querying time.
The content of the invention
The purpose of the present invention is to overcome deficiency in above-mentioned background technology, there is provided in high volume exports number in a kind of distributed environment
According to method, using first using multithreading list table to inquire about, then the data inquired are used into multithreading carry out table inquiry again, from
And effectively accelerate data export speed, shorten period of reservation of number.
In order to reach above-mentioned technique effect, the present invention takes following technical scheme:
A kind of method that data are in high volume exported in distributed environment, is comprised the steps of:
A.web front ends send the condition of data derived from needs, go out to meet number derived from user's needs by aggregate query
According to bar number, calculate each thread and complete how many data inquiries;
B. create one and realize java.util.concurrent.Callable interface threads, the interface thread is used for
Paging obtains the data of order table;
C. thread pool is created according to the configuring condition of server, query task is added in thread pool, and be each institute
State thread pool and preset optimal number of threads, according to the scheduling strategy of thread pool, call query task automatically, carry out the number of order table
It is investigated that asking, and the data for inquiring about the order table obtained are preserved into the first file;
D. data are carried out with conjunctive query, foundation and the one-to-one query task of order table, and single table is inquired
Data carry out paging, and the data inquired are stored in internal memory in the form of Map, obtain key values and value
Value, wherein, the key values are the id values of order data, and value values are the data value that contingency table needs to get;
E. the data of Map form of the data in first file with being obtained in step D are assembled, is combined into symbol
The List set that export requires is closed, and List set is stored in the second file;
F. the data in the second file are parsed, obtains data in the form of IO streams, and write data into
In Excel file;
G. an Excel bar number higher limit is preset to Excel file, is write data into step F in Excel file
When, if the Excel bars of write-in are counted to up to the Excel bars number higher limit, by the number of the form for the Map being stored in internal memory
According to being written in hard disk;
H. http agreements are used, file data is write into the response output stream of the http request of web front-end, completes large quantities of
Measure data export.
Further, when the data of order table are obtained in the step B, the packaged inquiry specially in Service layers
The method of data, jpa is called, inquired about using sql sentences, by passing paging parameter and inquiry bar in controller layers
Part is to Service layers to obtain the data of paging query.
Further, it is that the thread pool uses following calculation when presetting optimal number of threads in the step C:Most
Good number of threads=(thread latency/thread CPU time+1) * CPU numbers.
Further, the scheduling strategy of thread pool is described in the step C:Operation is added to the line in thread pool first
Journey, when number of threads exceedes default number of threads, unnecessary thread is waited in line to run in waiting list.
Further, in the step E export requirement be:Sequence information, wound are comprised at least in data derived from it is required that
Build people's information, company information.
Further, when writing data into the step G in Excel file, mainly use POI's
SXSSFWorkbook。
The present invention compared with prior art, has following beneficial effect:
In in high volume exporting the method for data in a kind of distributed environment of the present invention, it handles logic to be used for multiple times
Multithreading, io stand-by period is reduced to most short;Wherein, inquired about using multithreading list table, then the data inquired use is more
Thread carries out table inquiry again, can effectively accelerate data export speed, shorten period of reservation of number, while also help reduction number
According to the load of storehouse server and application server.
Brief description of the drawings
Fig. 1 is the processing stream of the one embodiment for the method that data are in high volume exported in a kind of distributed environment of the present invention
Journey schematic diagram.
Embodiment
With reference to embodiments of the invention, the invention will be further elaborated.
Embodiment:
As shown in figure 1, in high volume exporting the method for data in a kind of distributed environment, specifically below scheme step is included:
S101:Web front-end sends the condition of data derived from needs;
S102:Data acquisition request with export condition is dealt into controller corresponding to back office interface by web front-end
In method, total data bar number derived from needing is gone out further according to export condition query;
S103:Being calculated according to the total number of inquiry, which needs to be divided into how many individual threads, carries out single table export, during calculating, Thread Count
=total number/total number, total number that each thread can be completed derived from each thread can be looked into by program test maximum load
How many are ask, the multiple threads i.e. query task created is put into thread pool, carries out the data query of order table.
S104:The big data quantity for inquiring about the order table got is deposited into file system, and it is more to calculate needs again
Few thread completes data assembling.
Such as in demand derived from an order, wish to look inside Excel to sequence information, founder's information, it is public
Information is taken charge of, these three information are stored in three tables of three databases respectively, inquire order numbers first after order data
According to being stored in file system A, then take out order ID and be deposited into List, because also two tables need to inquire about, therefore also
Need to create two thread pools, go to obtain founder's information and company information using the mode for obtaining order, after getting respectively
The founder's information and company information got using two Map storages.
Wherein, a thread completes a lot data assembling, the number of log-on data association while starting thread together
According to thread corresponding to table.Each association thread is then assembled into form desired by Excel, example using In inquiry batch query data
As there is sequence information in Excel, founder's information, company information, then by existing three parts data assembling in a row.
S105:The data in file are read using the form of stream, the data read are written in Excel forms,
One storage bar number threshold values is set in Excel forms, hard disk is write data into automatically when reaching and storing bar number threshold values
In, so just completion batch data is exported in Excel for circulation, reuses http agreements toward the sound of the http request of web front-end
It should export and file data is write in stream, complete the export of high-volume data.
It is understood that the principle that embodiment of above is intended to be merely illustrative of the present and the exemplary implementation that uses
Mode, but the invention is not limited in this.For those skilled in the art, the essence of the present invention is not being departed from
In the case of refreshing and essence, various changes and modifications can be made therein, and these variations and modifications are also considered as protection scope of the present invention.
Claims (6)
1. the method for data is in high volume exported in a kind of distributed environment, it is characterised in that comprise the steps of:
A.web front ends send the condition of data derived from needs, go out to meet data strip derived from user's needs by aggregate query
Number, calculate each thread and complete how many data inquiries;
B. create one and realize java.util.concurrent.Callable interface threads, the interface thread is used for paging
Obtain the data of order table;
C. thread pool is created according to the configuring condition of server, is that query task is added in thread pool by thread, and be each
The thread pool presets optimal number of threads, according to the scheduling strategy of thread pool, calls query task automatically, carries out order table
Data query, and the data for inquiring about the order table obtained are preserved into the first file;
D. conjunctive query is carried out to data, establish with the one-to-one query task of order table, and to data that single table inquires
Paging is carried out, and the data inquired are stored in internal memory in the form of Map, obtains key values and value values, its
In, the key values are the id values of order data, and value values are the data value that contingency table needs to get;
E. the data of Map form of the data in first file with being obtained in step D are assembled, is combined into meet and leads
Go out the List set of requirement, and List set is stored in the second file;
F. the data in the second file are parsed, data is obtained in the form of IO streams, and write data into Excel texts
In part;
G. an Excel bar number higher limit is preset to Excel file, when being write data into step F in Excel file,
If the Excel bars of write-in are counted to up to the Excel bars number higher limit, by the data of the form for the Map being stored in internal memory
It is written in hard disk;
H. http agreements are used, file data is write into the response output stream of the http request of web front-end, completes high-volume number
According to export.
2. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described
When the data of order table are obtained in step B, the method for packaged inquiry data, calling jpa, make specially in Service layers
Inquired about with sql sentences, by passing paging parameter and querying condition to Service layers to obtain point in controller layers
The data of page inquiry.
3. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described
It is that the thread pool uses following calculation when presetting optimal number of threads in step C:Optimal number of threads=(thread waits
Time/thread CPU time+1) * CPU numbers.
4. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described
The scheduling strategy of thread pool is described in step C:Operation is added to the thread in thread pool first, is preset when number of threads exceedes
Number of threads when, unnecessary thread is waited in line to run in waiting list.
5. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described
Export in step E requires:Sequence information, founder's information, company information are comprised at least in data derived from it is required that.
6. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described
When writing data into step G in Excel file, POI SXSSFWorkbook is mainly used.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711059530.3A CN107798111B (en) | 2017-11-01 | 2017-11-01 | Method for exporting data in large batch in distributed environment |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711059530.3A CN107798111B (en) | 2017-11-01 | 2017-11-01 | Method for exporting data in large batch in distributed environment |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN107798111A true CN107798111A (en) | 2018-03-13 |
| CN107798111B CN107798111B (en) | 2021-04-06 |
Family
ID=61548636
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201711059530.3A Active CN107798111B (en) | 2017-11-01 | 2017-11-01 | Method for exporting data in large batch in distributed environment |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN107798111B (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109299157A (en) * | 2018-08-27 | 2019-02-01 | 杭州安恒信息技术股份有限公司 | A data export method and device for distributed large single table |
| CN110532311A (en) * | 2019-08-14 | 2019-12-03 | 泰安协同软件有限公司 | A kind of distributed data deriving method and system based on queue |
| CN111914151A (en) * | 2020-08-11 | 2020-11-10 | 上海毅博电子商务有限责任公司 | Association table object query optimization method |
| CN113177826A (en) * | 2021-05-20 | 2021-07-27 | 青岛海信智慧生活科技股份有限公司 | Method and device for configuring commodities and cells in batch |
Citations (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6542920B1 (en) * | 1999-09-24 | 2003-04-01 | Sun Microsystems, Inc. | Mechanism for implementing multiple thread pools in a computer system to optimize system performance |
| CN101996067A (en) * | 2009-08-19 | 2011-03-30 | 阿里巴巴集团控股有限公司 | Data export method and device |
| CN102360310A (en) * | 2011-09-28 | 2012-02-22 | 中国电子科技集团公司第二十八研究所 | Multitask process monitoring method and system in distributed system environment |
| CN103034735A (en) * | 2012-12-26 | 2013-04-10 | 北京讯鸟软件有限公司 | Big data distributed file export method |
| CN103092993A (en) * | 2013-02-18 | 2013-05-08 | 五八同城信息技术有限公司 | Data exporting method and data exporting device |
| CN103412961A (en) * | 2013-09-04 | 2013-11-27 | 广东全通教育股份有限公司 | Processing method and system for real-time exporting report form of mass data |
| CN103793519A (en) * | 2014-02-14 | 2014-05-14 | 浪潮通信信息系统有限公司 | Automatic tool supporting exportation of mass data |
| US8782101B1 (en) * | 2012-01-20 | 2014-07-15 | Google Inc. | Transferring data across different database platforms |
| CN103995807A (en) * | 2013-02-16 | 2014-08-20 | 长沙中兴软创软件有限公司 | Massive data query and secondary processing method based on Web architecture |
| CN104679813A (en) * | 2013-11-28 | 2015-06-03 | 三星电子株式会社 | Data storage device, data storage method and data storage system |
| EP2777009A4 (en) * | 2011-11-10 | 2015-06-17 | Microsoft Technology Licensing Llc | Export of content items from multiple, disparate content sources |
| CN105740293A (en) * | 2014-12-12 | 2016-07-06 | 金蝶软件(中国)有限公司 | Data export method and device |
| CN106095775A (en) * | 2016-05-24 | 2016-11-09 | 中国银行股份有限公司 | A kind of method and system realizing data query or derivation |
| CN106407231A (en) * | 2015-08-03 | 2017-02-15 | 天脉聚源(北京)科技有限公司 | A data multi-thread export method and system |
| CN106776829A (en) * | 2016-11-28 | 2017-05-31 | 成都广达新网科技股份有限公司 | A kind of data guiding system and its method of work |
-
2017
- 2017-11-01 CN CN201711059530.3A patent/CN107798111B/en active Active
Patent Citations (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6542920B1 (en) * | 1999-09-24 | 2003-04-01 | Sun Microsystems, Inc. | Mechanism for implementing multiple thread pools in a computer system to optimize system performance |
| CN101996067A (en) * | 2009-08-19 | 2011-03-30 | 阿里巴巴集团控股有限公司 | Data export method and device |
| CN102360310A (en) * | 2011-09-28 | 2012-02-22 | 中国电子科技集团公司第二十八研究所 | Multitask process monitoring method and system in distributed system environment |
| EP2777009A4 (en) * | 2011-11-10 | 2015-06-17 | Microsoft Technology Licensing Llc | Export of content items from multiple, disparate content sources |
| US8782101B1 (en) * | 2012-01-20 | 2014-07-15 | Google Inc. | Transferring data across different database platforms |
| CN103034735A (en) * | 2012-12-26 | 2013-04-10 | 北京讯鸟软件有限公司 | Big data distributed file export method |
| CN103995807A (en) * | 2013-02-16 | 2014-08-20 | 长沙中兴软创软件有限公司 | Massive data query and secondary processing method based on Web architecture |
| CN103092993A (en) * | 2013-02-18 | 2013-05-08 | 五八同城信息技术有限公司 | Data exporting method and data exporting device |
| CN103412961A (en) * | 2013-09-04 | 2013-11-27 | 广东全通教育股份有限公司 | Processing method and system for real-time exporting report form of mass data |
| CN104679813A (en) * | 2013-11-28 | 2015-06-03 | 三星电子株式会社 | Data storage device, data storage method and data storage system |
| CN103793519A (en) * | 2014-02-14 | 2014-05-14 | 浪潮通信信息系统有限公司 | Automatic tool supporting exportation of mass data |
| CN105740293A (en) * | 2014-12-12 | 2016-07-06 | 金蝶软件(中国)有限公司 | Data export method and device |
| CN106407231A (en) * | 2015-08-03 | 2017-02-15 | 天脉聚源(北京)科技有限公司 | A data multi-thread export method and system |
| CN106095775A (en) * | 2016-05-24 | 2016-11-09 | 中国银行股份有限公司 | A kind of method and system realizing data query or derivation |
| CN106776829A (en) * | 2016-11-28 | 2017-05-31 | 成都广达新网科技股份有限公司 | A kind of data guiding system and its method of work |
Non-Patent Citations (1)
| Title |
|---|
| ZX星辰: ""Java Excel SXSSFWorkbook大量数据导出"", 《HTTPS://BLOG.CSDN.NET/ZXXINGCHEN/ARTICLE/DETAILS/70159473》 * |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109299157A (en) * | 2018-08-27 | 2019-02-01 | 杭州安恒信息技术股份有限公司 | A data export method and device for distributed large single table |
| CN110532311A (en) * | 2019-08-14 | 2019-12-03 | 泰安协同软件有限公司 | A kind of distributed data deriving method and system based on queue |
| CN110532311B (en) * | 2019-08-14 | 2023-11-28 | 泰安协同软件有限公司 | Distributed data export method and system based on queues |
| CN111914151A (en) * | 2020-08-11 | 2020-11-10 | 上海毅博电子商务有限责任公司 | Association table object query optimization method |
| CN113177826A (en) * | 2021-05-20 | 2021-07-27 | 青岛海信智慧生活科技股份有限公司 | Method and device for configuring commodities and cells in batch |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107798111B (en) | 2021-04-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107480198B (en) | Distributed NewSQL database system and full-text retrieval method | |
| JP6223569B2 (en) | Computer apparatus, method and apparatus for scheduling business flows | |
| CN112651826B (en) | Credit line management and control system, method and readable storage medium | |
| CN100594498C (en) | Mass data real time processing structure and real time need-based processing platform used for the structure | |
| CN112000703B (en) | Data warehousing processing method and device, computer equipment and storage medium | |
| CN107798111A (en) | A kind of method that data are in high volume exported in distributed environment | |
| CN103970520A (en) | Resource management method and device in MapReduce framework and framework system with device | |
| CN104123340A (en) | Table-by-table and page-by-page query method and system for database | |
| CN102043625A (en) | Workflow operation method and system | |
| CN107798038A (en) | Data response method and data response apparatus | |
| WO2019047441A1 (en) | Communication optimization method and system | |
| CN107800899A (en) | Attend a banquet and method, apparatus, equipment and the computer-readable recording medium of service are provided | |
| CN111400465A (en) | Generation method, device, electronic device and medium of customer service robot | |
| CN109241384A (en) | A visualization method and device for scientific research information | |
| US9292405B2 (en) | HANA based multiple scenario simulation enabling automated decision making for complex business processes | |
| CN109597825B (en) | Rule engine calling method, device, equipment and computer readable storage medium | |
| CN102567493A (en) | Data report system and report generation method with dynamic data source | |
| CN111552546B (en) | Task implementation method and device based on multithreading and storage medium | |
| WO2022253165A1 (en) | Scheduling method, system, server and computer readable storage medium | |
| CN111552569A (en) | System resource scheduling method, device and storage medium | |
| JP2023184397A5 (en) | ||
| US12468574B2 (en) | Systems and methods for dynamically scaling remote resources | |
| CN114240188A (en) | Task allocation method and device | |
| CN104735149A (en) | Cloud computing resource management system and method | |
| CN107798056A (en) | A kind of data query method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |