WO2004077216A2

WO2004077216A2 - System and method for heterogeneous data migration in real-time

Info

Publication number: WO2004077216A2
Application number: PCT/IN2004/000027
Authority: WO
Inventors: Vinayak K. Rao
Original assignee: Vaman Technologies (r & D) Ltd
Current assignee: Vaman Technologies (r & D) Ltd
Priority date: 2003-01-30
Filing date: 2004-01-29
Publication date: 2004-09-10
Anticipated expiration: 2005-07-30
Also published as: WO2004077216A3

Abstract

The present invention relates generally to the field of accessing and converting data from any source of data to a Relational Database Management System (RDBMS) complaint format. More particularly, the present invention relates to the field of providing data connectivity, which allows legacy Non-RDBMS applications to work in synchronization with any RDBMS in real time, without disturbing the existing legacy process but at the same time providing latest technology and features of RDBMS to a Non-RDBMS data without programming effort.

Description

TITLE OF INVENTION

System and Method for Heterogeneous Data Migration in Real-Time

BACKGROUND OF THE INVENTION

The digital evolution process created various technologies, which were not compatible with existing systems & technologies. Data created by such legacy technologies inherited a lot ot application limitations. As technologies progressed and communication improved, several new intelligent decision- making sub systems arose. Any predictive or pattern based Management Information System (MIS) report or analysis requires history of data to derive results or decision patterns. In many cases the legacy data was incompatible and the source of applications generating these data could not be replaced in real time. Typical verticals like stock exchanges, health care system or mission critical live applications could not be replaced with the newer systems overnight as many of them worked 24 hours a day and 365 days of the year. Also millions of dollars have been invested in building and supporting these legacy systems and replacing a live stable tested system a newer untested system is practically not viable.

Many solutions like middleware were built to bridge these systems and facilitate a smooth transition but there existed several limitations. Firstly, any middleware programming requires a thorough knowledge of both the systems. Secondly, middleware requires a lot of programming to do this interactive bridging. The cost of such a solution is prohibitory for small and medium size companies and maintenance of such a tiered system is economically not viable. Middleware worked only when source system had some export mechanism and target system had some import mechanism. But incase of vendor death of either (source or target) or non-provision for import or export middleware failed miserably. Hence there existed a need for a simpler system, which could without disturbing any legacy business process provide a seamless integration to newer technology and smoother technology transition. There existed a further need to provide seamless integration with the need for a human to explicitly program the method as per the typical system under consideration. SUMMARY OF THE INVENTION

To meet the foregoing needs, the present invention provides a software-implemented process, system and method for use in a computing environment. The preferred embodiment of the present invention provides a system and a method by which data migration can be rendered irrespective of server functionality in real time without the need for programming effort.

This invention enables the user, working on a legacy system to share and access data after the agent of the present invention is started. In the preferred embodiment this Migration Agent is located between the legacy application services and the RDBMS server. A request for legacy data by an ODBC/OLEDB/JDBC compliant client for such a RDBMS server is serviced by the Migration Agent, which provides seamless connectivity to legacy application services, including data servers and now such data is available to client which requested this data.

The proposed invention has some process pre-requisites, which has to be followed prior to deployment and execution of the invention. The source file system data has to be defragmented or never to be defragmented. Secondly the path, access rights as well as moving of the source data files should be avoided after the mapping of the invention is complete. Incase defragmentation / re-indexing routines or utilities, are executed corresponding re-indexing process has to be executed in the proposed invention too. The Migration Agent redirects file operation activities of a particular set of files ignoring the other irrelevant ones. The Migration Agent is hooked to the legacy Data and the Legacy Application. The agent redirects file operation activities of a particular set of files ignoring the other irrelevant ones. The enterprise-wide network is generally a combination of systems that could be legacy systems or ODBC/OLEDB/JDBC compliant systems including application services, data servers and clients. The agent allows for real time data migration and hence data is available for use by various other processes that want these data. After the agent has hooked the OS file system, all operation such as 'Open', 'Close', 'Tell', 'Flush', 'Seek', 'Write', 'Read' can be tracked on the mapped files. In other words client doesn't know whether data resides on legacy data server or some other data server, thus providing seamless connectivity.

The DML Agent and the Server Agent translate every Data Definition Language (DDL), Data Control Language (DCL), Data Manipulation Language (DML) commands and execute on the legacy database confining to the limitations of legacy data and server design. The Global Cache reads and updates the configuration files and stores a copy of the configuration files.

The Object Translator maps OS file operation's Handle parameter/argument to Object ID, the Offset Translator maps file offset on which the legacy data is read or written to a ROWID (record identifier) and the Operation Translator according to DDL / DML and using Network Agent Module translates file buffer to buffer / Tuple data as per the object definition in the relational database. For example, if application data files include five data files called fuel , file2, file3, file4 and fileδ. Once the files are mapped via the agent to an RDBMS, these files get translated to TABLE as tablet with name as filet (which the user can change) table2 as file2 and so on. The source files could either be linked files, which contains only DDL, or embedded files that contains both DDL and DML or pure raw data. Hence as seen a single file DDL Non Relational ('NR') data maps to a single object in a RDBMS.

The Source could have Non-ODBC / Non-OLEDB / Non-JDBC compatible files. The data files could be single DDL file or Multiple DDL file. Consider a single DDL data file with fixed sized header and fixed size record like a dbase file the Offset Translator maps these offset of different source data files to Unique Row ID. The file OPEN operation, each non-relational source data file has to be individual opened for Read/Write operation which is mapped to connect in target RDBMS. This Connect opens data files and all objects in data file through a single Open command executed on target database. The OPEN operation on Text as well as Binary source data files can either be Read Only Mode, Read/Write Mode or Append Mode. The READ file operation is mapped to SELECT. If file is opened in Read/Write mode and offset has data existing and buffer data is NULL or a delete data pattern is found (Ex: Dbase / Foxbase used '*' as delete marker), then a DELETE is interpreted because existing data is cleared. The data file is either single or multiple Data Definition Language. The source and the target data files could be of different format. The column name of source data file and target destination data file are displayed. The agent after reading the configuration setting and data-mapping template next reads the scheduled jobs configuration on agent and does the task at the scheduled time. Also the agent can be configured for real-time data translation or on-line data backup depending on hardware speed, size and complexity of data. The proposed terminology "real-time" depicts infinitesimally small time-period ranging between nanoseconds to milliseconds depending on hardware resources, which means the difference between the request operation and response conversion is negligibly small.

BRIEF DESCRIPTION OF THE DRAWINGS

The various objects and advantages of the present invention will become apparent to those of ordinary skill in the relevant art after reviewing the following detailed description and accompanying drawings, wherein;

Fig.1 is a block diagram illustrating the functional blocks of the preferred embodiment of the invention.

Fig. 2(a) and Fig. 2(b) is a flow diagram and its continuation, illustrating the process by which the invention carries the translation of heterogeneous file formats to another heterogeneous file formats using the present invention when the agent is started.

Fig. 3 is a block Diagram of the Interfacing of the Agent with a computer system. The working of computer system can be explained as consisting of the blocks of layers like the hardware layer, the basic input/output system (BIOS), the Device Driver, the Operating System (OS) Interrupt Service Routine (ISR) Hooks, the OS File System layer, the Application Layer and trapping the file operations hooks and redirecting it using Event Driver of the Agent module. FIG. 4 is a block diagram illustrating the working of Legacy System using COBOL depicting COBOL and its internal components.

FIG. 5 illustrates the working of legacy system after the agent is started.

FIG. 6 depicts an enterprise-wide network, which may typically include data server, clients and application services.

Fig. 7 depicts the preferred embodiment of the invention comprising of a mechanism to translate/map the Operating System file Operations like Open, Close, Tell, Flush, Seek, Write, Read etc into Object ID, Row ID and Buffer/Tuple Data using the original file arguments/parameters like Handle, Offset, Buffer.

FIG. 8 depicts the mechanism using the Object Translator to map legacy source data, which comprises of data files lying in their respective path.

FIG. 9 depicts the mechanism using the Offset translator to map the offset parameter or argument of a file operation like Seek, Flush, Write, Read for different types of non-relational source data files to unique Row Identifier (ROWID).

FIG. 10 depicts the mechanism using the Operation Translator according to Data Definition Language (DDL) or Data Manipulation Language (DML) and Network Agent, by which the return values or results of file operation like Read/Write etc are mapped to buffer/tuple data.

FIG. 11 depicts the READ file operation is mapped to SELECT. Based on file modes of Read it is correspondingly translated to SELECT, an RDBMS object. FIG. 12 if source file is opened in append mode than any subsequent write is translated to INSERT verifying offset position and existence of data at that position.

FIG. 13 illustrates the computer screen when a agent of the present invention is started/initiated.

FIG. 14 illustrates the computer screen when an embodiment of the present invention is started / initiated and the Agent task is configured accordingly to the classification of the File Operation.

FIG. 15 illustrates the computer screen of the present invention when scripts to be triggered are specified.

FIG. 16 illustrates the computer screen of the present invention depicting the user interface of the scheduler.

FIG. 17 illustrates the computer screen of the present invention depicting the user interface of the scheduler set for recurring activity.

DETAILED DESCRIPTION OF THE INVENTION

While the present invention is susceptible to embodiment in various forms, as shown in the drawings & will hereinafter be described a presently preferred embodiment with the understanding that the present disclosure is to be considered an exemplification of the invention and is not intended to limit the invention to the specific embodiment illustrated.

In the present disclosure, the words "a" or "an" are to be taken to include both a singular and the plural. Conversely any reference to plural items shall, where appropriate include the singular.

Referring now to the drawings, more particularly FIG. 1 is a block diagram illustrating the functional blocks of the preferred embodiment of the present invention. As depicted in the diagram, the preferred embodiment of the present invention comprises of a Migration Agent 100, which is made up of an Object / Data Mapper 105, a Cache Manager 110, a Request Analyzer 115 and an Event Driver 120. The Migration Agent 100 is connected to a Dispatcher Agent 125, a Network Agent 130, Global Cache 135, a DML Map 140, a Parser 145, a DML agent 150, a Server Agent 155 and a Disk Agent 160

In the preferred embodiment of the present invention, the Network Agent 130 interfaces with the network and supports all popular communication protocols and is the only communication conduit for all client interfaces. Any mapped file operations that need to be notified to the parent server are communicated via the Network Agent 130 by the Event Driver 120. This is carried out by sending unique packets related to specific operations on files with requisite parameters. In the preferred embodiment of the present invention, the Network Agent 130 can be plugged to a Remote Database 165, it can transfer an Interrupt Service Routine (ISR) Event 170and can also connect to an ODBC-compliant database 175

The Dispatcher Agent 125 is the buffering module, which schedules the data output across various concurrent requests. Its primary job is to balance the requested data flow across various queries, taking care of query timeouts, buffer formatting and managing data caching when concurrent requests demand same data.

The Request Analyzer 115 acts like a command interpreter and manages legacy data file operations such as object operations. Each legacy data file maps to one or more objects and operations like 'read', 'write', 'open', 'close', 'tell', 'flush' etc. are translated to SELECT/ INSERT/ UPDATE etc. As most legacy databases support two basic DBMS object entities such as Table & Indexes. The interpretation, translation and execution of other object entities based on these basic objects, is handled by the Request Analyzer 115. For example if a view is created on these legacy table objects, which are linked via the present invention, the execution of an object VIEW, which is inherently not supported by DBMS architecture still is interpreted by the analyzer and executed with the help of DML Agent 150 or Server Agent 155.

The Cache Manager 110 acts like a resource manager or temporary object repository, managing various database objects. The objects could be physically different file entities having various paths and mapping their source to target translations like a look up table used for mapping. The Cache Manager 110 serves as the primary object definition storage, which is always looked up for translation, interpretation and execution of any command or file operations.

The Event Driver 120 is the heart of the Migration Agent 100. The entire scheduling architecture of the Migration Agent 100 is built around a co-operative multithreading kernel. The Interrupt Service Routine hook installed during the agent startup serves as the prime initiator for sourcing or sinking requests. Any operations on the mapped file are translated into a network / disk event primarily by the event driver. For example in case the agent functionality is mapped for supporting a real time online backup, every write request operation on the mapped file of the legacy server is archived onto a local database file via the Disk Agent 160 or on a Remote Database 165 via the Network Agent 130. Hence any crash recovery process demanding data can be routed on the local database via the Disk Agent 160 or on the Remote Database 165 via the network. Likewise, the ODBC clients 175 can use the data created by these parallel operations to service any requests without disturbing the original data files. Further, the Event Driver 120 has to function as Legacy interface through Interrupt Service Routine Hooks 180.

The Object and Data Mapper 105 translates the legacy data file operations like read/write etc. into relational entities. that are required to map Non-RDBMS data to a relational data. In other words, each file operation or data operation is translated to a corresponding database object(s) identifier on which the operations are performed (Ex: DDL, DML etc.).

The Parser 145 is a generic SQL compatible syntax and semantic parser and analyzer, which validates ODBC client requests and in conjunction with the request the Object / Data Mapper 105 translates the objects in the query to corresponding data files in the legacy database file system. Many operations, which cannot be performed as per the limitation of the legacy data and data type support available, are shunted by the Parser 145 itself before the request is allowed to be executed. For example most legacy database never had Large Objects (LOB) or Variable Character (varchar) support that are vital for running an enterprise. Hence standard data types, which had fixed data widths and corresponding available database functionalities are analyzed by the Parser 145 and allowed to be processed.

The Disk Agent 160 performs the dual task of serving the hook for the legacy data as well as acting a translator for ODBC command to legacy data format and vice versa. The Disk Agent 160 monitors every file operation on the legacy data files and the relevant information required by the parent server is notified by the Event Driver 120 into a database command with the help of the Disk Agent 160. In the event that the Migration Agent 100 is configured for real time, local backup the translation of flat file data across database files into a centralized RDBMS format is also performed by the Disk Agent 160.

The DML Agent 150 and the Server Agent 155 translate every Data Definition Language, Data Control Language, Data Manipulation Language commands and execute on the legacy database confining to the limitations of legacy data and server design. Extension of RDBMS functionalities to be performed on legacy data is the prime objective of these agents but the restrictions of the capturing mode of legacy data binds this functionality to executable minimum.

The Global Cache 135 reads and updates the configuration files and stores a copy of the configuration files. The DDL Map 140 module prepares the updated and loads the data translation of the files to Global Cache 135 for any operations on the files from the configuration settings. Since most object definitions reside here, the Global Cache 135 data helps in analyzing and translating a write operation to INSERT / UPDATE / DELETE operation, based on data definition, file offsets and data buffer specified in WRITE operation (EX: Incase the offset is not an integral multiple of record size, then the write operation can be UPDATE or DELETE, upon analysis of the buffer the WRITE operation is further translated to DELETE if NULL or DELETE data patterns are found). Fig. 2 is a flow diagram illustrating the process by which the preferred embodiment of the present invention carries the translation of heterogeneous data files to another heterogeneous file formats after the Migration Agent 100 is started 200. While the Migration Agent 100 is being installed, the configuration file is automatically generated and configured. As soon as the Migration Agent 100 is started or initialized as a service its first objective is to read the configuration settings file and data mapping templates of system residing in the Migration Agent 100 module 202. Next, the Migration Agent 100 reads the scheduled jobs 204 as per the configuration settings such as the object that is file name and the set of functionalities like file operations enabled for execution. For example the Migration Agent 100 can be an online backup system or notification system or just a passive client, which has to respond only when queries arrive. As per the functionalities expected, the objects of functionalities need to be obtained from the configuration. For example for some files the agent can be configured just for notification but for others as a backup sub-system. The specified functionalities may demand automatic execution of a predefined event at a defined interval. The settings for these scheduled tasks are also verified and a time specific thread is forked to handle such jobs scheduled. Next, the Migration Agent 100 probes the environment and compares the last state and current state of the environment 206. Since most legacy databases depend on the OS file system, probing the environment and comparing the last state and current state needed for successful agent functionality execution of operations and objects configured are required to be validated. This process basically checks for database files, paths, permissions, attributes and current state whether in use or in usable state. Doing this is a necessity as other applications may use the files in an exclusive mode or may have changed the attributes to hidden or read-only. Also the files or paths may have been deleted or moved to a place not matching with the configuration settings or execution parameters. The Migration Agent 100 then proceeds to map and connect to the last known servers if any server is available and configured 208. It then proceeds to prepare a DDL map table from configuration defaults then validates the map details with current files and their status in system 210. The Migration Agent 100 then proceeds to validate the map details with the current files and their status in the system 212. If the map details are found to be invalid, the Migration Agent 100 proceeds to generate an error message 214. However if the map details are found to be valid, the Migration Agent 100 proceeds to load the map table with updated or validated data 216. The Migration Agent 100 then hooks to the system's interrupt service routines for file handling, once all the prerequisites as per the configuration are met. This is primarily done to trap any file activity in the system, which can be filtered across applications using the mapped file data configured to be operated upon. If the file objects to be operated upon are available as configured as from configuration file, a DDL map for the data translation of these files is loaded in global cache for any operations on the files from the configuration settings. The Migration Agent 100 then hooks to the system ISR and captures the prerequisites system information 218. It then waits for the hooked application and mapped files to be executed by any external application 220 and proceeds to poll for any pending connects and in the event a pending connect is found, a connection is established 222. The Migration Agent 100 then checks if a valid event is triggered 224. In the event of a valid operation or event triggered the Migration Agent 100 classifies the operation 226 and as per the operation, various steps are triggered. In the event the operation is invalid, the Migration Agent 100 waits for the hooked application and mapped file 220. The Migration Agent 100 functionality varies as per the operation type like 'Open', 'Close', 'Read', 'Write' etc and the functionality configured for trapping these operations.

The Migration Agent 100 then proceeds to check if the operation is an open file operation 228. In the event that the operation is an Open file operation, the Migration Agent 100 proceeds to check if it is a mapped file 230. This translation map is basically the target server interpretation of byte pattern of data in the file and can vary as per various server representations. For example if a single DDL data file is having few columns of data the translation of same file data for different servers connected to the agent can vary as per the data type interpretation of the target server. Hence as per the connectivity of the ODBC client the byte interpretation or the DDL map can vary. In the event it is not a mapped file, the Migration Agent 100 proceeds to execute the default procedure 231 i.e. the file system native interrupt service routine ('ISR'). In the event that it is a mapped file, the Migration Agent 100 proceeds to check if the mapped files are successfully opened for any operation 232. A mapped data file when opened for any operation irrespective of application, is cached with its handle ID and related object definition and translation map as set in the configuration file 234. After this process, the respective usage count is incremented as per the file operation 236. The Migration Agent 100 then proceeds to wait for the next operation 238.

On checking if the operation is an open file operation 228, in the event that it is not, the Migration Agent 100 proceeds to check if it is a Close file operation 240.

In the event it is a Close file operation, the Migration Agent 100 proceeds to check if the file handle is mapped 242. In the event that the file handle is not mapped, the Migration Agent 100 proceeds to execute the default procedure 231. However, if the file handle is mapped, it proceeds to check if the close operation is successful 244. In the event of an unsuccessful close operation, the Migration Agent 100 proceeds to check the count 246. In other words, the Migration Agent 100 checks if there are any users left which have open file handles for the same file. In the event of a successful close operation, the count is decremented 248 and the Migration Agent 100 then proceeds to check the count 246. On checking the count, the Migration Agent 100 proceeds to remove the object path from the cache 250 and waits for the next operation 238.

In the event that the operation is not a Close file operation, the Migration Agent 100 proceeds to classify it as a Read/Write/Seek/Tell or Flush operation 252. The Migration Agent 100 then checks if file handle is mapped 254. In the event that the file handle is not mapped, the Migration Agent 100 executes the default procedures 231. However, if the file handle is mapped The Migration Agent 100 uses the handle . and the operation parameters to source or sink events using the look up operations on mapped object 256. The Migration Agent 100 then checks if the operation requires notification before execution 258. In the event that the operation does not require notification before execution, it proceeds to check if it is a successful operation 260. However, if the operation requires notification before execution, the Migration Agent 100 proceeds to generate the notification with the operation parameters 262. It then checks if it is a successful operation 260. In the event that it is not a successful operation, it generates an error code 264 In the event that the operation is successful, the Migration Agent 100 checks if the notification is required after the successful operation execution 266 proceeds to wait for the next operation 238.

The proposed invention supports triggers or user-defined events to be executed before, during or after the hooked operations. Incase an event notification is required (triggered) after the successful operation execution, the notification is generated along with the operation parameter after the operation and reported 268. This feature allows altering or usage of parameters for intelligent decision-making before or after the execution of the event with the result of the execution. The Migration Agent then proceeds to wait for the next operation 238.

Fig. 3 is a block diagram depicting the interfacing of the Migration Agent 100 with a computer system. It depicts the basic working layers of a computer system and a Migration Agent 100 of the current invention interfaced with it, in analogy to Open Systems Interconnection (OSI) reference model layers. The working of computer system can be explained as consisting of the blocks of layers like the Hardware layer 300, the Basic Input/output System (BIOS) 305, the Device Driver 310, the Operating System Interrupt Service Routine (ISR) Hooks 315, the OS File System layer 320, the Application Layer 325 and trapping the file operations hooks and redirecting it 330 using the Event Driver 120, which form a part of the Migration Agent 100, in addition to the Network Agent 130 which is connected to the network. The lowest layer is the Hardware layer 300 that includes the physical devices just to name a few of these devices such as hard disk drives (HDD), Floppy Disk Drive (FDD), Monitor, Motherboard etc.

The Basic Input Output Service (BIOS) 305 is above the Hardware layer 300 and below the Device Driver layer 310. BIOS 305 is the program which a personal computer's microprocessor uses to get the computer system started after it is turned on. It also manages data flow between the computer's operating system and attached devices such as the hard disk, video adapter, keyboard, mouse, and printer. BIOS 305 is a program that is made accessible to the microprocessor on an erasable programmable read-only memory (EPROM) chip. When a computer is switched on, the microprocessor passes control to the BIOS program 305. When the BIOS 305 boots up on a computer, it first determines whether all of the attachments are in place and operational and then it loads the operating system (or key parts of it) into your computer's random access memory (RAM) from the hard disk or diskette drive. With BIOS 305, the Operating System and its applications are freed from having to understand exact details (such as hardware addresses) about the attached input output devices. Although BIOS 305 is theoretically always the intermediary between the microprocessor and Input/Output device control information and data flow, in some cases, BIOS 305 can arrange for data to flow directly to the memory from devices (such as video cards) that require faster data flow to be effective.

The Device Driver block 310 is the BIOS layer 305 and below the Operating System Interrupt Service Routine Hooks layer 315. In general, device drivers are programs that are written so that the computer knows what to do with a device or a virtual device (a piece of software that is written to act like a device)

The Operating System Interrupt Service Routine (OS ISR) Hooks layer 315 is above the Device Driver block 310 and below the Operating File System layer 320. The Migration Agent 100 hooks to the OS ISR block and traps the operations like Open, Close, Read, Write, Seek, Tell, Flush etc on the files. The Migration Agent 100 interfaces with the OS ISR Hooks 315 and the hooks are trapped by the Redirected Hook Module 330. The Redirected Hook Module 330 in turn handles these hooks to the Event Driver 120. The Event Driver 120 then interacts with the Migration Agent 100.

The OS File System 320 lies below the Application layer 325 and above the OS ISR Hooks 315. The OS File System 320 is the way in which files are named and where they are placed logically for storage and retrieval. The DOS, Windows, OS/2, Macintosh, and UNIX-based operating systems all have file systems in which files are placed somewhere in a hierarchical (tree) structure. A file is placed in a directory (folder in Windows) or subdirectory at the desired place in the tree structure. The Application Layer 325 is the top-most block of the computer system. It contains data in, user defined format and applications using business logic.

The Migration Agent 100 residing on the legacy server is invoked as a service or a daemon process. The Migration Agent 100 redirects file operation activities of a particular set of files ignoring the other irrelevant ones. Generally only very few operations are supported on files like open, close, read, write, tell, seek, flush etc. The initial configuration or communication between the Migration Agent 100 and clients OS can establish the files to be monitored or operated upon. The Migration Agent 100 caches these files details and their mapped Data Definition Language tables are monitored for every Interrupt Service Routine invocation and their operational parameters are tracked in memory.

FIG. 4 (a) is a block diagram illustrating one embodiment of the working of a Legacy System using Cobol depicting Cobol and its internal components. Now referring to part (a) of this Figure illustrates the working of a legacy system before the Migration Agent 100 is started. In information technology, legacy applications 400 and data 405 are those that have been inherited from languages, platforms, and techniques earlier than current technology. COBOL (Common Business Oriented Language) was the first widely used programming language for business applications. Many payroll, accounting, and other business application programs written in COBOL over the past years are still in use. The language is generally perceived as out-of-date and COBOL programs are generally viewed as legacy applications. A popular version of COBOL had ISAM (Indexed Sequential Access Method) 410 which is a file management system developed at IBM, that allows records to be accessed either sequentially (in the order they were entered) or randomly (with an index). Each index defines a different ordering of the records. An employee database may have several indexes, based on the information being sought. For example, a name index may order employees alphabetically by last name, while a department index may order employees by their department. A key is specified in each index. For an alphabetical index of employee names, the last name field would be the key.) The ISAM 410 is interfaced with the OS files 415 and hence redirects the OS ISR hooks as depicted in the figure. Now referring to the second part of Fig. 4 illustrating the preferred embodiment of the working of a Legacy application 440 after the Migration Agent 100 is started. As an analogy for working of the current invention consider the popular COBOL version with ISAM 420 as a Terminate and Stay Resident (TSR) program that hooks to the reserved interrupts of the operating system to manage database activities on the Legacy Data 425. This is similar to the working of the preferred embodiment of the present invention, where the Migration Agent 100 hooks to the OS ISR hooks of the Operating System's File System 430. Various functional options were handled and translated by ISAM. The preferred embodiment of the present invention as illustrated in the diagram works parallel to the ISAM 420, wherein the Agent Software 435 has the capability of mimicking the work of ISAM 420. In other words, an optional feature is provided where the Migration Agent 100 can bypass any input/output on original/legacy data.

Fig. 5 illustrates an alternative embodiment of the present invention depicting the working of a Legacy system after the Migration Agent 100 is started as part of the Agent Software Service 520. The Migration Agent 100 of the present invention is invoked as a service or daemon process. The Agent Software Service 520 mimics the working of ISAM 525 and the Migration Agent 100 is hooked to the Legacy Data 515 and the Legacy Application 500. The Migration Agent 100 redirects the file operation activities of a particular set of files ignoring the other irrelevant ones. Besides the invention can also trap file operations like open, close, read, tell, seek, flush etc on the newer RDBMS that could be ODBC- compliant 505. The initial configuration or communication between the Migration Agent 100 and clients OS files 510 can establish the files to be monitored or operated upon. The Migration Agent 100 caches these file details, their mapped DDL tables are monitored for every ISR invocation and their operational parameters are tracked.

Various functional options were provided which were translated to function indexes and the legacy application passed parameters as per function requirement and invoked the TSR to execute database functionality. For example the Migration Agent 100 service has an option to hook to this parent ISR and provide RDBMS functionality. Alternatively the Migration Agent 100 has the option to ignore the parent DB engine and without the application change translate the required functionality to ODBC compatible commands. Hence the Legacy Application 505 without any change starts working in an ODBC compatible standard and data generated from legacy application is seamlessly available to the RDBMS application.

FIG. 6 depicts an enterprise-wide network that may typically include Data Servers 600, 605, Clients 610, 615, 620 and Application Services 625. The enterprise-wide network is generally a combination of systems that could be legacy systems or ODBC/OLEDB / JDBC compliant systems including Application Services 625, Data Servers 600, 605 and Clients 610, 615, 620. That is to say that the data or application servers could be either a Legacy Server 600 or an ODBC/OLEDB / JDBC compliant Server 605. Further the Clients can either be Legacy Client 610, ODBC/OLE client 615 or a Remote RDBMS console client 620. The Migration Agent 100 provides for seamless connectivity and makes data available to various clients irrespective of their nature. The Migration Agent 100 allows for real time data migration and hence data is available for use by various other processes that want these data.

As depicted in the diagram, the Migration Agent 100 provides an interactive way of handling changes in data once the agent hooks to the Operating System Interrupt service routine. The Migration Agent 100 uses its Event Driver 120 to send packets of data as well as headers. After the Migration Agent 100 is hooked, all operation like Open, Close, Tell, Flush, Seek, Write, Read can be tracked on the mapped files. The Event Driver 120 translates these operations into an event and any changes to the source data, is notified as a message by the agent kernel scheduling various operations. Likewise any request query invoked from the ODBC clients 615 invokes an event, which translates the query data 630 into requisite operations to fetch results using Message Bus 635.

In other words, the client does not know whether data resides on Legacy Data Server 600 or any other data server, thus providing seamless connectivity. For example, an ODBC / OLEDB /JDBC compliant database Client 615 request for data from a Legacy Server application 625. In this case, the requested data cannot be made available without some migration tool that allows seamless real-time migration. As depicted in Fig. 1 the Migration Agent 100 provides for such seamless real-time connectivity and sits between the Legacy Application Services 625 and the RDBMS server 605. A request for legacy data by an ODBC/OLEDB / JDBC compliant client of such a RDBMS Server 605 is serviced by the Migration Agent 100, which provides seamless connectivity to Legacy Application Services, including Data Servers and now such data is available to the client which requested this data.

Consider another example where a Legacy Client 610 can get requested data from an RDBMS Server 605 through the use of the Migration Agent 100. The Migration Agent 100 provides seamless connectivity between clients irrespective of their nature and data servers or RDBMS servers. The Migration Agent 100 hooks to the Operating System's Interrupt service routine so that it can track the operations being performed on predefined files.

Another consideration can also include a remote RDBMS Console Client 620 requesting data from the Legacy Application Services 625. As seen from the diagram, the Migration Agent 100 provides a way for seamless connectivity between such a remote client and legacy services. The Migration Agent 100 hooks to the Operating System's Interrupt service routine so that it can track the operations being performed on predefined files.

As illustrated in Fig. 7 depicts the preferred embodiment of the invention and comprises of a mechanism to translate/map the Operating System file Operations such as 'Open' 700, 'Close' 705, 'Tell' 710, 'Flush' 715, 'Seek' 720, 'Write' 725, 'Read' 730 etc into Object ID 735, Row ID 740 and Buffer/Tuple Data 745 using the original file arguments/parameters like Handle 750, Offset 755, Buffer 760. The Request Analyzer 115 using the Object /Data Mapper 105 does the translation as shown in the Fig. The Object Translator 765 maps OS file operation's Handle parameter/argument to Object ID 735, the Offset Translator maps file offset on which the legacy data is read or written to a ROWID (record identifier) 740 and the Operation Translator according to DDL/DML 775 and using the Network Agent Module 130 translates file buffer to buffer/Tuple data 745 as per the object definition in the relational database. The Original file arguments / parameters such as the handle used by the various file operations are connected to the Hooked files 780. The file operation 'Open' 700 takes path and returns handle 790. The file operation 'Close' 705 takes handle. The other similar operations like 'Tell' 710, 'Seek' 720 etc take handle and return position 792. The 'Write' 725, 'Read' 730 etc operation takes handle 750, Offset 755 and Buffer 760 and returns results like Bytes written successfully 794. In case of the operation translator according to the DDL7DML 775 clubbed with the Network Agent 130 and other Migration can be used.

Fig.8 depicts the mechanism using the Object Translator 765 to map a legacy source data, which comprises of data files lying in their respective path. Each file can be a single DDL file or multiple DDL file that is one single file having data for more than one object. Once one map these files via Migration Agent 100 to an RDBMS these files get translated to TABLE as tablel with name as filel (which the user can change) table2 as file2 and so on. As illustrated in the figure, for example suppose that we have a legacy data on a source computer's 800 Hard Disk Drive (HDD). Consider source data files lying in their respective space/path like say application directory 810. Let such application data files 815 be five data files named as filel 820, file2 825, file3 830, file4 835, fileδ 840. Each file can be a single DDL file or multiple DDL file that is one single file having data for more than one object. Once we map these files via Migration Agent 100 to an RDBMS 845 these files get translated to TABLE as tablel with name as filel 850 (which the user can change) table2 as file2 855 , table 5 named as fileδ 860 and so on. The source files could either be linked files, which contains only DDL, or embedded files that contains both a DDL and DML. Hence as seen a single file DDL Non-relation data maps to a single object in a RDBMS.

Now referring to FIG. 9 depicting the mechanism using the Offset translator 770 to map the offset parameter or argument of a file operation such as 'Seek', 'Flush', 'Write', 'Read' etc for different types of non-relational source data files to ROWID 740. A ROWID 740 in a RDBMS is a physical unique position of a record similar to the absolute value of offset. Care has to be taken to update these translations if source data is re-indexed or packed. The Source could have Non-Relational data files such as shown in Fig. The data files could be single DDL file 900 or Multiple DDL file 905. Further the single DDL files 900 could have single or optional, fixed or variable sized header. The record could be either of a fixed size or variable size. Consider a single DDL data file with fixed sized header and fixed size record 920 such as a dbase file. Consider another single DDL data file with optional header and fixed size record 925 such as some versions of COBOL files. Consider another single DDL data file with optional variable sized header and variable sized record pointed by record length (RL) 930 such as a user-defined C application. Also various other types of multiple DDL source data files 905 are possible as shown in fig. Consider a multiple DDL data file 905 with optional header and data in a record preceded by Object Information (01) next Record Length (RL) 935 as shown in the figure. Consider another multiple DDL data file 905 with optional header and Object Information is part of data in a record preceded by Record Length (RL) 940 as shown in Fig. The Offset Translator 770 maps these offset of different source data files to Unique RowlD 740. The Object translation 765 creates the Table object in a target database and an extent of blocks is allocated 950 to it. This allocated space is called object extent, further each such extent consists of multiple blocks. The block 955 is further consisting of Block Header 960, tuple Index 966 points to address of each unique ROW ID 740 where data is recorded. Hence the Record/Tuple having unique row ID (ROW ID) 740 is created after Offset Translator 770 maps the file offset parameter/argument.

Now referring to FIG.10 depicting the mechanism using the Operation Translator according to DDL/DML 77δ and Network agent 130, by which the return values or results of file operation like Read/Write etc are mapped to buffer/tuple data 745.

The file OPEN operation, each non-relational source data file has to be individual opened for Read / Write operation which is mapped to connect 1015 in target RDBMS. This Connect opens data files and ail objects in data file through a single Open command executed on target database. The different Modes of Opening source file that is either in Read Mode 1020 or Read/Write Mode 1025 or Append Mode 1030 is translated to appropriate cursor types whenever Read/Write are performed. The OPEN operation on Text as well as Binary source data files can either be Read Only Mode 1020, Read/Write Mode 1025 or Append Mode 1030. The OPEN operation on source data file in Read Only Mode maps to Read Only Cursor 1035 in the respective target file. Similarly the OPEN operation on source data files in Read/Write mode 1025 as well as Append Mode 1030 maps to Keyset/Dynamic cursors 1040 in the respective target file.

Similarly, only CLOSE all command 1045 at source is translated to Disconnect 1050 on target Database. User Schema that is user-name and password has to be specified along with Database and Server details during connect.

The difference of file operations lies in interpretation of the objects that could either be DBMS or an RDBMS. Hence Open file operation performed on a Non-relational data file which the DBMS has to perform for every individual data file can be translated to a single operation connect which opens all objects for operations similar to open file (obviously in RW mode) Similarly a read based on arguments can be translated to 'SELECT', an append operation based on data can be translated to 'INSERT' and so on.

Now referring to FIG. 11 the 'READ' 1100 file operation is mapped to 'SELECT' 1105. Based on file modes of Read it is correspondingly translated to 'SELECT' 1105, an RDBMS object. The READ Modes depending on the cursor 1110, it could be Read only, Keyset, Dynamic based on how the source file was opened. For a single record, buffer size and current position specifies records to be read which can be optimized to read all record 1120. As an example, consider the current cursor position 1125 is at 400 and next 400 bytes are to be read that is till byte number 800, 1130. This is translated to a record in target database by 'SELECT' with ROW ID as parameter 1135.

Now referring to FIG. 12 if source file 1200 is opened in append mode than any subsequent 'write' 1210 is translated to 'INSERT' 1215 verifying offset position and existence of data at that position. If file is opened in Read/Write mode and offset is having data existing, it gets translated to 'UPDATE' 1220 statement (assuming the pre-requisites specified earlier at met, this assumption enforces that no empty blocks of data existed). If file is opened in Read/Write mode and offset has data existing and buffer data is NULL (or by a DELETE pattern) then a 'DELETE' 1225 is interpreted because existing data is cleared. Many Non-Relational use a delete marker at a particular offset with respect to record data.

TELL /SEEK etc is generally used to position current file handle. This optionally maybe used to translate to ROW ID for large amount of data. Similarly FLUSH file operation may rarely be used unless the OS supports caching utilities like smartdrive. This can be used if required to trigger CHECKPOINTING.

FIG. 13 illustrates the computer screen when a Migration Agent 100 of the present invention is started/initiated.

FIG. 14 illustrates the computer screen when an preferred embodiment of the present invention is started/initiated and the Migration Agent 100 task is configured accordingly to the classification of file operation. The data file is either single Data Definition Language 1405 or multiple DDL 1410. The source file irrespective of the type and format can be selected 1415 from the browse option provided.

Also the Destination Object 1420 can be user-defined. The source and the target data files could be of different format. The column name 1425 of source data file and target/destination data file are displayed. Also other parameters like data types 1430, precision 1435 and scale 1440 for source as well as destination are displayed. Also all these parameters/arguments can be as per user's choice or needs.

FIG. 1δ illustrates the user interface 1600 of the present invention when scripts to be triggered are specified 1510. The scripts could be any language script, even user-defined script are supported. For the file operations that are trapped, specify the scripts to be triggered. The scripts can be set as required. The configuration can be done to specify the scripts to be triggered corresponding to the file operations. Also the scripts could be triggered as, before 1515 or after 1520 the translation operation. If the script is to be triggered before the file translation/mapping the file operation trapped by Redirected hooks is performed after the trigger event returns a success flag. If the script is set to be triggered after the file translation/mapping the file operation trapped by Redirected hooks is performed before the trigger event occurs and next the hooks is returned to the original ISR.

FIG. 16 illustrates the screen shot of a Migration Agent of the present invention depicting the user interface of the scheduler. When scripts have to be triggered can specified. In other words the jobs can be scheduled to be done at some pre-definite time. The agent after reading the configuration setting and the data mapping template 1605, proceeds to reads the scheduled jobs configuration 1610 on the Migration Agent 100 and does the task at the scheduled time 1615. Passive client configuration is all other times the task may not be scheduled.

FIG. 17 illustrates the computer screen of the present invention depicting the user interface of the scheduler set for recurring activity. The scheduling of the jobs can be done at some pre-definite time. The agent after reading the configuration setting and data mapping template 1705 proceed to reads the scheduled jobs configuration on the Migration Agent 100 and does the task at the scheduled time 1710. Passive client configuration is all other times the task is not scheduled. Also the Migration Agent 100 can be configured for real-time data translation or on-line data backup. Also notification services can be scheduled on the agent of the present invention.

Claims

What is claimed is:

1. A data migration system to migrate data between a data file created independent of source of generation to a database compliant system comprising:

a request analyzer to manage file operations and interpret instructions received in a predetermined format;

a messaging kernel to coordinate tasks and interface between operating system and the user;

a memory storage means to store and maintain definitions required for translation; and

an object and data mapper to perform said translation of said data files for any operations on said data files based on definitions in said memory store and predetermined configuration files.

whereby said data migration system is capable of migrating said data files including a type non- database compliant system to a database compliant system without loss of data.

2. The data migration system as recited in claim 1 contains means for performing said migration process in real-time

3. The data migration system as recited in claim 1 is further interfaced with a caching means to read and update said data file as per said predetermined configuration files.

4. The data migration system as recited in claim 1 is configured to map each said data file object and operation to a corresponding relational database object and operation using a map created by said request analyzer.

5. The data migration system as recited in claim 4 wherein the request analyzer further comprises of:

an object translator to map said data file handle operations to a corresponding object identifier on said database compliant system

an offset translator to map said data file offset operations to a corresponding record identifier on said database compliant system

an operation translator to translate said data file buffer operations to corresponding tuple data as per object definition in said database compliant system

6. The data migration system as recited in claim 5 facilitates legacy applications created on file based database systems to interact and interface seamlessly with contemporary database technologies

7. The data migration system as recited in claim 1 wherein the messaging kernel gains of said operating system to channel and control said operating system specific operations by hooking said operating system file system.

8. The data migration system as recited in claim 7 wherein the messaging kernel is programmed using a finite state machine model wherein user or system defined triggers are executed before, after or during said data files operations to perform user defined functions.

9. A method of data migration to migrate data between a data file created independent of source of generation to a database compliant system comprising the steps of:

translating and mapping using a file pointer of said data file to object identifier of target database using an object translator translating and mapping using an offset pointer of said data file to record identifier of target database using an offset translator

translating and mapping using a memory pointer of said data file to corresponding database compliant operations on target database using an operation translator

whereby said data migration system is capable of migrating said data files using a set of predetermined definitions and configuration files to a database compliant system without loss of data.

10. The method as recited in claim 9 wherein a read operation is mapped and translated to a select operation determining a unique position of a record similar to the absolute value of said offset on said database compliant system using said file pointer, said offset pointer and said memory pointer.

11. The method as recited in claim 9 wherein a write operation is mapped and translated to either an insert, delete or update operation and a unique position of a record similar to the absolute value of said offset on said database compliant system using said file pointer, said offset pointer and said memory pointer.

12. The method as recited in 9 includes means for storing said data file translations within a global cache for reading and updating said data files as per said predetermined configuration files.

13. The method as recited in claim 12 wherein said messaging kernel controls specific commands destined for the said data file rather than the operating system by hooking the file system of said operating system to control said operating system commands.

14. The method as recited in claim 9 facilitates legacy applications created on file based database systems interact and interface seamlessly with contemporary database technologies.

15. The method of executing operating system file operations on a database compliant file, and vice- versa comprising the steps of:

translating a file pointer on said operating system file to an object identifier on said database compliant file using an object translator or vice-versa;

translating an offset pointer on said operating system file to a record identifier on said database compliant file using an offset translator or vice-versa;

translating a buffer pointer on said operating system file to a translation buffer if target file is a datafile or translating said buffer pointer to an SQL statement if target file is a database compliant file using an operation translator or vice-versa;

translating operating system file operations comprising of open, close, seek, tell, flush, read, write or equivalent operations to corresponding database compliant operations comprising of connect, disconnect, select, insert, update, delete or equivalent operations based on said object identifier, said record identifier or said translation buffer.

16. The method as recited in claim 15 wherein an open operation is executed as a connect operation on said target database compliant file or vice-versa;

17. The method as recited in claim 15 wherein a close operation is executed as a disconnect operation on said target database compliant file or vice-versa;

18. The method as recited in claim 15 wherein a read operation is executed as a select operation on said target database compliant file or vice-versa; and

9. The method as recited in claim 15 wherein a write operation is executed as either an insert, delete or update operation on said target database compliant file or vice-versa.