[go: up one dir, main page]

CN107798111A - A kind of method that data are in high volume exported in distributed environment - Google Patents

A kind of method that data are in high volume exported in distributed environment Download PDF

Info

Publication number
CN107798111A
CN107798111A CN201711059530.3A CN201711059530A CN107798111A CN 107798111 A CN107798111 A CN 107798111A CN 201711059530 A CN201711059530 A CN 201711059530A CN 107798111 A CN107798111 A CN 107798111A
Authority
CN
China
Prior art keywords
data
thread
file
excel
distributed environment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711059530.3A
Other languages
Chinese (zh)
Other versions
CN107798111B (en
Inventor
李波
岳永胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201711059530.3A priority Critical patent/CN107798111B/en
Publication of CN107798111A publication Critical patent/CN107798111A/en
Application granted granted Critical
Publication of CN107798111B publication Critical patent/CN107798111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method that data are in high volume exported in distributed environment, it is improved for the deficiency of tradition export data, using first using the inquiry of multithreading list table, the data inquired are used into multithreading carry out table inquiry again again, accelerate data export speed so as to reach, shorten the effect of period of reservation of number.The present invention reduces the load of database server and application server using the technology that multithreading is used for multiple times while shortening period of reservation of number.

Description

A kind of method that data are in high volume exported in distributed environment
Technical field
The present invention relates to technical field of data processing, the side of data is in high volume exported in more particularly to a kind of distributed environment Method.
Background technology
With the continuous upgrading of software, demand, which gradually increases traditional design method, can not meet business demand.Cause This must use distributed development mode, and business is split.Database layer design uses more storehouses, and multilist is designed.Cloud Platform operation personnel or enterprise administrator need to carry out export analysis to operation data at any time.Although software developer can basis The demand of data analyst develops the function of meeting to require.But propose among use of reaching the standard grade that demand can be undergone from demand A series of cumbersome processes such as analysis, Outline Design, written in code, test, modification bug, issue.However, market is often wind Cloud change, chance is also written in water over time, and therefore, analyzing operation data in time turns into every web site operator Required course.
In routine work, network operator more gets used to exporting to Excel forms to the various visualized operations of data progress.Therefore Export high-volume data, which have become developer, needs problems faced.Conventional architectures use single database, single server mode.Directly Connect inquiry and export to Excel cans.But a large amount of numbers can not exported using traditional mode in distributed environment According to.The main problems faced of distributed environment is that database is designed by the way of more storehouses.Traditional mode can be to database Carry out conjunctive query and conjunctive query can not be carried out behind point storehouse.
Exported as seen from the above in distributed environment by the way of traditional, user must face long-time etc. Treat, system is also by the inquiry problem for facing M*N (M is the number of data of inquiry, and N is the table being related to).Such as in export 10,000 Data, it is related to 3 tables.Will be 30,000 times to data base querying.In the case where concurrency is higher, database will necessarily Collapse.Therefore, the mode of traditional derived table is unfavorable for lifting data export speed and reduces the data base querying time.
The content of the invention
The purpose of the present invention is to overcome deficiency in above-mentioned background technology, there is provided in high volume exports number in a kind of distributed environment According to method, using first using multithreading list table to inquire about, then the data inquired are used into multithreading carry out table inquiry again, from And effectively accelerate data export speed, shorten period of reservation of number.
In order to reach above-mentioned technique effect, the present invention takes following technical scheme:
A kind of method that data are in high volume exported in distributed environment, is comprised the steps of:
A.web front ends send the condition of data derived from needs, go out to meet number derived from user's needs by aggregate query According to bar number, calculate each thread and complete how many data inquiries;
B. create one and realize java.util.concurrent.Callable interface threads, the interface thread is used for Paging obtains the data of order table;
C. thread pool is created according to the configuring condition of server, query task is added in thread pool, and be each institute State thread pool and preset optimal number of threads, according to the scheduling strategy of thread pool, call query task automatically, carry out the number of order table It is investigated that asking, and the data for inquiring about the order table obtained are preserved into the first file;
D. data are carried out with conjunctive query, foundation and the one-to-one query task of order table, and single table is inquired Data carry out paging, and the data inquired are stored in internal memory in the form of Map, obtain key values and value Value, wherein, the key values are the id values of order data, and value values are the data value that contingency table needs to get;
E. the data of Map form of the data in first file with being obtained in step D are assembled, is combined into symbol The List set that export requires is closed, and List set is stored in the second file;
F. the data in the second file are parsed, obtains data in the form of IO streams, and write data into In Excel file;
G. an Excel bar number higher limit is preset to Excel file, is write data into step F in Excel file When, if the Excel bars of write-in are counted to up to the Excel bars number higher limit, by the number of the form for the Map being stored in internal memory According to being written in hard disk;
H. http agreements are used, file data is write into the response output stream of the http request of web front-end, completes large quantities of Measure data export.
Further, when the data of order table are obtained in the step B, the packaged inquiry specially in Service layers The method of data, jpa is called, inquired about using sql sentences, by passing paging parameter and inquiry bar in controller layers Part is to Service layers to obtain the data of paging query.
Further, it is that the thread pool uses following calculation when presetting optimal number of threads in the step C:Most Good number of threads=(thread latency/thread CPU time+1) * CPU numbers.
Further, the scheduling strategy of thread pool is described in the step C:Operation is added to the line in thread pool first Journey, when number of threads exceedes default number of threads, unnecessary thread is waited in line to run in waiting list.
Further, in the step E export requirement be:Sequence information, wound are comprised at least in data derived from it is required that Build people's information, company information.
Further, when writing data into the step G in Excel file, mainly use POI's SXSSFWorkbook。
The present invention compared with prior art, has following beneficial effect:
In in high volume exporting the method for data in a kind of distributed environment of the present invention, it handles logic to be used for multiple times Multithreading, io stand-by period is reduced to most short;Wherein, inquired about using multithreading list table, then the data inquired use is more Thread carries out table inquiry again, can effectively accelerate data export speed, shorten period of reservation of number, while also help reduction number According to the load of storehouse server and application server.
Brief description of the drawings
Fig. 1 is the processing stream of the one embodiment for the method that data are in high volume exported in a kind of distributed environment of the present invention Journey schematic diagram.
Embodiment
With reference to embodiments of the invention, the invention will be further elaborated.
Embodiment:
As shown in figure 1, in high volume exporting the method for data in a kind of distributed environment, specifically below scheme step is included:
S101:Web front-end sends the condition of data derived from needs;
S102:Data acquisition request with export condition is dealt into controller corresponding to back office interface by web front-end In method, total data bar number derived from needing is gone out further according to export condition query;
S103:Being calculated according to the total number of inquiry, which needs to be divided into how many individual threads, carries out single table export, during calculating, Thread Count =total number/total number, total number that each thread can be completed derived from each thread can be looked into by program test maximum load How many are ask, the multiple threads i.e. query task created is put into thread pool, carries out the data query of order table.
S104:The big data quantity for inquiring about the order table got is deposited into file system, and it is more to calculate needs again Few thread completes data assembling.
Such as in demand derived from an order, wish to look inside Excel to sequence information, founder's information, it is public Information is taken charge of, these three information are stored in three tables of three databases respectively, inquire order numbers first after order data According to being stored in file system A, then take out order ID and be deposited into List, because also two tables need to inquire about, therefore also Need to create two thread pools, go to obtain founder's information and company information using the mode for obtaining order, after getting respectively The founder's information and company information got using two Map storages.
Wherein, a thread completes a lot data assembling, the number of log-on data association while starting thread together According to thread corresponding to table.Each association thread is then assembled into form desired by Excel, example using In inquiry batch query data As there is sequence information in Excel, founder's information, company information, then by existing three parts data assembling in a row.
S105:The data in file are read using the form of stream, the data read are written in Excel forms, One storage bar number threshold values is set in Excel forms, hard disk is write data into automatically when reaching and storing bar number threshold values In, so just completion batch data is exported in Excel for circulation, reuses http agreements toward the sound of the http request of web front-end It should export and file data is write in stream, complete the export of high-volume data.
It is understood that the principle that embodiment of above is intended to be merely illustrative of the present and the exemplary implementation that uses Mode, but the invention is not limited in this.For those skilled in the art, the essence of the present invention is not being departed from In the case of refreshing and essence, various changes and modifications can be made therein, and these variations and modifications are also considered as protection scope of the present invention.

Claims (6)

1. the method for data is in high volume exported in a kind of distributed environment, it is characterised in that comprise the steps of:
A.web front ends send the condition of data derived from needs, go out to meet data strip derived from user's needs by aggregate query Number, calculate each thread and complete how many data inquiries;
B. create one and realize java.util.concurrent.Callable interface threads, the interface thread is used for paging Obtain the data of order table;
C. thread pool is created according to the configuring condition of server, is that query task is added in thread pool by thread, and be each The thread pool presets optimal number of threads, according to the scheduling strategy of thread pool, calls query task automatically, carries out order table Data query, and the data for inquiring about the order table obtained are preserved into the first file;
D. conjunctive query is carried out to data, establish with the one-to-one query task of order table, and to data that single table inquires Paging is carried out, and the data inquired are stored in internal memory in the form of Map, obtains key values and value values, its In, the key values are the id values of order data, and value values are the data value that contingency table needs to get;
E. the data of Map form of the data in first file with being obtained in step D are assembled, is combined into meet and leads Go out the List set of requirement, and List set is stored in the second file;
F. the data in the second file are parsed, data is obtained in the form of IO streams, and write data into Excel texts In part;
G. an Excel bar number higher limit is preset to Excel file, when being write data into step F in Excel file, If the Excel bars of write-in are counted to up to the Excel bars number higher limit, by the data of the form for the Map being stored in internal memory It is written in hard disk;
H. http agreements are used, file data is write into the response output stream of the http request of web front-end, completes high-volume number According to export.
2. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described When the data of order table are obtained in step B, the method for packaged inquiry data, calling jpa, make specially in Service layers Inquired about with sql sentences, by passing paging parameter and querying condition to Service layers to obtain point in controller layers The data of page inquiry.
3. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described It is that the thread pool uses following calculation when presetting optimal number of threads in step C:Optimal number of threads=(thread waits Time/thread CPU time+1) * CPU numbers.
4. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described The scheduling strategy of thread pool is described in step C:Operation is added to the thread in thread pool first, is preset when number of threads exceedes Number of threads when, unnecessary thread is waited in line to run in waiting list.
5. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described Export in step E requires:Sequence information, founder's information, company information are comprised at least in data derived from it is required that.
6. the method for data is in high volume exported in a kind of distributed environment according to claim 1, it is characterised in that described When writing data into step G in Excel file, POI SXSSFWorkbook is mainly used.
CN201711059530.3A 2017-11-01 2017-11-01 Method for exporting data in large batch in distributed environment Active CN107798111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711059530.3A CN107798111B (en) 2017-11-01 2017-11-01 Method for exporting data in large batch in distributed environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711059530.3A CN107798111B (en) 2017-11-01 2017-11-01 Method for exporting data in large batch in distributed environment

Publications (2)

Publication Number Publication Date
CN107798111A true CN107798111A (en) 2018-03-13
CN107798111B CN107798111B (en) 2021-04-06

Family

ID=61548636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711059530.3A Active CN107798111B (en) 2017-11-01 2017-11-01 Method for exporting data in large batch in distributed environment

Country Status (1)

Country Link
CN (1) CN107798111B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299157A (en) * 2018-08-27 2019-02-01 杭州安恒信息技术股份有限公司 A data export method and device for distributed large single table
CN110532311A (en) * 2019-08-14 2019-12-03 泰安协同软件有限公司 A kind of distributed data deriving method and system based on queue
CN111914151A (en) * 2020-08-11 2020-11-10 上海毅博电子商务有限责任公司 Association table object query optimization method
CN113177826A (en) * 2021-05-20 2021-07-27 青岛海信智慧生活科技股份有限公司 Method and device for configuring commodities and cells in batch

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6542920B1 (en) * 1999-09-24 2003-04-01 Sun Microsystems, Inc. Mechanism for implementing multiple thread pools in a computer system to optimize system performance
CN101996067A (en) * 2009-08-19 2011-03-30 阿里巴巴集团控股有限公司 Data export method and device
CN102360310A (en) * 2011-09-28 2012-02-22 中国电子科技集团公司第二十八研究所 Multitask process monitoring method and system in distributed system environment
CN103034735A (en) * 2012-12-26 2013-04-10 北京讯鸟软件有限公司 Big data distributed file export method
CN103092993A (en) * 2013-02-18 2013-05-08 五八同城信息技术有限公司 Data exporting method and data exporting device
CN103412961A (en) * 2013-09-04 2013-11-27 广东全通教育股份有限公司 Processing method and system for real-time exporting report form of mass data
CN103793519A (en) * 2014-02-14 2014-05-14 浪潮通信信息系统有限公司 Automatic tool supporting exportation of mass data
US8782101B1 (en) * 2012-01-20 2014-07-15 Google Inc. Transferring data across different database platforms
CN103995807A (en) * 2013-02-16 2014-08-20 长沙中兴软创软件有限公司 Massive data query and secondary processing method based on Web architecture
CN104679813A (en) * 2013-11-28 2015-06-03 三星电子株式会社 Data storage device, data storage method and data storage system
EP2777009A4 (en) * 2011-11-10 2015-06-17 Microsoft Technology Licensing Llc Export of content items from multiple, disparate content sources
CN105740293A (en) * 2014-12-12 2016-07-06 金蝶软件(中国)有限公司 Data export method and device
CN106095775A (en) * 2016-05-24 2016-11-09 中国银行股份有限公司 A kind of method and system realizing data query or derivation
CN106407231A (en) * 2015-08-03 2017-02-15 天脉聚源(北京)科技有限公司 A data multi-thread export method and system
CN106776829A (en) * 2016-11-28 2017-05-31 成都广达新网科技股份有限公司 A kind of data guiding system and its method of work

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6542920B1 (en) * 1999-09-24 2003-04-01 Sun Microsystems, Inc. Mechanism for implementing multiple thread pools in a computer system to optimize system performance
CN101996067A (en) * 2009-08-19 2011-03-30 阿里巴巴集团控股有限公司 Data export method and device
CN102360310A (en) * 2011-09-28 2012-02-22 中国电子科技集团公司第二十八研究所 Multitask process monitoring method and system in distributed system environment
EP2777009A4 (en) * 2011-11-10 2015-06-17 Microsoft Technology Licensing Llc Export of content items from multiple, disparate content sources
US8782101B1 (en) * 2012-01-20 2014-07-15 Google Inc. Transferring data across different database platforms
CN103034735A (en) * 2012-12-26 2013-04-10 北京讯鸟软件有限公司 Big data distributed file export method
CN103995807A (en) * 2013-02-16 2014-08-20 长沙中兴软创软件有限公司 Massive data query and secondary processing method based on Web architecture
CN103092993A (en) * 2013-02-18 2013-05-08 五八同城信息技术有限公司 Data exporting method and data exporting device
CN103412961A (en) * 2013-09-04 2013-11-27 广东全通教育股份有限公司 Processing method and system for real-time exporting report form of mass data
CN104679813A (en) * 2013-11-28 2015-06-03 三星电子株式会社 Data storage device, data storage method and data storage system
CN103793519A (en) * 2014-02-14 2014-05-14 浪潮通信信息系统有限公司 Automatic tool supporting exportation of mass data
CN105740293A (en) * 2014-12-12 2016-07-06 金蝶软件(中国)有限公司 Data export method and device
CN106407231A (en) * 2015-08-03 2017-02-15 天脉聚源(北京)科技有限公司 A data multi-thread export method and system
CN106095775A (en) * 2016-05-24 2016-11-09 中国银行股份有限公司 A kind of method and system realizing data query or derivation
CN106776829A (en) * 2016-11-28 2017-05-31 成都广达新网科技股份有限公司 A kind of data guiding system and its method of work

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZX星辰: ""Java Excel SXSSFWorkbook大量数据导出"", 《HTTPS://BLOG.CSDN.NET/ZXXINGCHEN/ARTICLE/DETAILS/70159473》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299157A (en) * 2018-08-27 2019-02-01 杭州安恒信息技术股份有限公司 A data export method and device for distributed large single table
CN110532311A (en) * 2019-08-14 2019-12-03 泰安协同软件有限公司 A kind of distributed data deriving method and system based on queue
CN110532311B (en) * 2019-08-14 2023-11-28 泰安协同软件有限公司 Distributed data export method and system based on queues
CN111914151A (en) * 2020-08-11 2020-11-10 上海毅博电子商务有限责任公司 Association table object query optimization method
CN113177826A (en) * 2021-05-20 2021-07-27 青岛海信智慧生活科技股份有限公司 Method and device for configuring commodities and cells in batch

Also Published As

Publication number Publication date
CN107798111B (en) 2021-04-06

Similar Documents

Publication Publication Date Title
CN107480198B (en) Distributed NewSQL database system and full-text retrieval method
JP6223569B2 (en) Computer apparatus, method and apparatus for scheduling business flows
CN112651826B (en) Credit line management and control system, method and readable storage medium
CN100594498C (en) Mass data real time processing structure and real time need-based processing platform used for the structure
CN112000703B (en) Data warehousing processing method and device, computer equipment and storage medium
CN107798111A (en) A kind of method that data are in high volume exported in distributed environment
CN103970520A (en) Resource management method and device in MapReduce framework and framework system with device
CN104123340A (en) Table-by-table and page-by-page query method and system for database
CN102043625A (en) Workflow operation method and system
CN107798038A (en) Data response method and data response apparatus
WO2019047441A1 (en) Communication optimization method and system
CN107800899A (en) Attend a banquet and method, apparatus, equipment and the computer-readable recording medium of service are provided
CN111400465A (en) Generation method, device, electronic device and medium of customer service robot
CN109241384A (en) A visualization method and device for scientific research information
US9292405B2 (en) HANA based multiple scenario simulation enabling automated decision making for complex business processes
CN109597825B (en) Rule engine calling method, device, equipment and computer readable storage medium
CN102567493A (en) Data report system and report generation method with dynamic data source
CN111552546B (en) Task implementation method and device based on multithreading and storage medium
WO2022253165A1 (en) Scheduling method, system, server and computer readable storage medium
CN111552569A (en) System resource scheduling method, device and storage medium
JP2023184397A5 (en)
US12468574B2 (en) Systems and methods for dynamically scaling remote resources
CN114240188A (en) Task allocation method and device
CN104735149A (en) Cloud computing resource management system and method
CN107798056A (en) A kind of data query method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant