Tuesday, May 6, 2008

Flat File Schema Overview

Here I will go briefly through my Flat File Schema tags.
Root tag is Files which encapsulates files that should be validated.
File tag defines the file through Name, Alias and etc attributes.
File can have Columns as child element that contains column definitions.
Each column can have validation rules (Value, Length or Type) for the data. Each rule can include Where element that contains conditions when this rule should apply. This will allow you to build in your business logic into the validation process.
If you files have relations you can include Link element in Files element to link files. This will allow application to validate data in one file taking data from other files into account (You can reference column from the other file in the Where element of you rule).

Application runs all the checks from FFS and creates error.log file with errors and their descriptions.

Next Step
Next thing to do is an Action Element that will define the future of the record in flat file if it is not valid.
Possible flags are:
Delete Delete row with this record from validated file
Replace Replace Value with some default value or use value from replacement list
Move Append row with failed record to other file and delete from validated file
Copy Append row to other file without deleting it

How to simplify flat file exchange process

The aim of this blog is to present and explain solution for flat file validation that I had build while working as DBA. This is literally an application that can separate rows of your files into ones that pass you validation criteria and ones that don't.

When you will want to use it?
There is one main that I had in mind while writing the code:
There is one DB with data and many external sources that supply data mainly via text files. Those text files should be compliant with some rules for successful import.

You aim is to be confident in data quality (technical side at list) once you've provided your data supplier with those rules. What happens in reality? That does not work. They have document that describes data fields and what data should be there but too often something goes wrong and have to build double check process anyway. Then ask for data resupply if something went wrong.

Now how can you guaranty that you data provider ensures all rules in the document you gave him? This document should be a Flat File Schema (FFS) that can be applied to data files through easily accessible (free license + internet download) application.

Application that can allow you to build Schema (XML document) through GUI;
XML Schema for FFS that is used for syntax validation;