• Български
  • English

DataMorphose - A framework for data migration

DataMorphose is an open source project, not very closely, but related to SolidOpt Framework and Infrastructure. It targets at defining the common set of migration steps and its main goal is to provide set of tools and reusable libraries, which will simplify the process of data migration.

What is data migration

Data migration is the process of transferring data between storage types, formats, or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks. It is required when organizations or individuals change computer systems or upgrade to new systems, or when systems merge (such as when the organizations that use them undergo a merger or takeover).

To achieve an effective data migration procedure, data on the old system is mapped to the new system providing a design for data extraction and data loading. The design relates old data formats to the new system's formats and requirements. Programmatic data migration may involve many phases but it minimally includes data extraction where data is read from the old system and data loading where data is written to the new system.

Why migration of data is important

The widespread use of computer technology over several decades has resulted in some large, complex systems which have evolved to a state where they significantly resist further modification and evolution. These Legacy Information Systems are normally mission-critical : if one of these systems stops working the business may grind to a halt. Thus for many organisations, decommissioning is not an option. An lternative solution is Legacy System Migration which has recently become an important research and practical issue. In 2007, Bloor Research conducted a survey into the state of the market for data migration. At that time, there were few tools or methodologies available that were targeted specifically at data migration and it was not an area of focus for most vendors. As a result, it was not surprising that 84% of data migration projects ran over time, over budget, or both.

DataMorphose Status

The base infrastructure is set up. There is subversion control and mailing list, which helps us track the current progress. We have many ideas how to continue to develop the project but unfortunately we do not have enough time and manpower.

The requirements and the commonalities of many existing data migration systems are analyzed and thoroughly described in Petya's diploma thesis (it's a pitty we don't have English version of it.).

There are many details to consider when migrating data. However, the process could be completed successfully if a good methodology is defined. In our point of view, the migration should go through the following 6 steps:

  • Data analysis - this is a process of inspecting the data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. In other word this is a gathering of as much as possible information about the data in the legacy system.
  • Data cleansing - once the analyzing stage is done, the data cleansing comes. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data, being found while analyzing, and then replacing, modifying, or deleting this dirty data.
  • Improve database schema - describes the semantics of a domain, being the scope of the model, also the semantics, as represented by a particular data manipulation technology. Although a schema is defined in text database language, the term is often used to refer to a graphical depiction of the database structure. In other words, schema is the structure of the database that defines the objects in the database. Therefore it is very important to have consistency and logical compatibility in the database schema.
  • Data mapping - the process of creating data element mappings between two data models. That is the action which finds corresponding fields in the two databases and links them.
  • Data migration - after completing the steps above the data is finally ready for the actual migration.
  • Data verification - the final step required is verifying the data, which means checking the data accuracy. That includes verifying the data in some aspects like if the data is imported in the right place in the new database; if the data have the expected meaning; etc.
  • She implemented a tiny prototype of her ideas, which is meant to be our starting point for further developments. It could be found at http://svn.solidopt.org/Tools/DataMorphose/trunk . For further details how to use our subversion control system please go here.

    The thesis does a research on the subject of data migration from one system to another. It includes a prototype of a framework – DataMorphose which implements the very basic elements representing the data structure.

    The research describes a methodology which should be followed. While the prototype represents a way to automate the steps in the migration process. The paper discusses extensively how the data and meta data are modeled, how the data should be displayed and ways of applying different methods of analysis and transformations.

    Some of the widely used data migration tools are analyzed along with their advantages and disadvantages. The thesis discusses also the design and architecture patterns used for the project at its current state and why they are chosen for that purpose.

    Future approaches are considered so as to provide a convenient user interface, of which components could be easily associated with the actions they perform. Another aspect pointed put in the thesis are the plans for developing a visual programming language and the reasons which led that decision.

    Future plans

    One of the main goals for the project is to improve the graphical interface. We aim to represent the data in a way similar to Mash up technology combining and managing data using drag&drop components. Another part of the plans is to extend the controller e.g. to add a lot more transformations and improve the basic ones. Also, we should refine the model of DataMorphose by extending it with new objects, which would represent the database structure more accurately. A future objective of DataMorphose framework is to evolve into a visual programming language which would be platform independent and also could be used in other SolidOpt projects.

    How find us

    Please have a look at how to contribute page.

    Thesis.pdf2.06 MB