Friday, February 26, 2010

Development plans for 2010. Make a Wish!

I want to get feedback to plan development in 2010.

From the end of 2009 I have stopped developing new features without user request, so if you think something is missing make sure you tell about it. There are still a lot of things that can be done to improve Flat File Checker.
I have number areas that I think should be developed/improved in 2010, but I am not sure whether they are relevant to you:

See descriptions for each point bellow.

1. Match & Merge (De-duplication)
a. Duplicates Report - implemented - see User Guide->Add Actions->Report Duplicates.
b. Fuzzy Logic / Heuristic
- Soundex - implemented - see example of a derived field with Soundex function in the User Guide->Calculated Field->Add Calculated Field;
- Levenshtein (distance between two strings)
- Dictionaries (matches between official names and nicknames, i.e.: Robert - Rob - Robbie - Robin - Rupert - Bob - Bobby - Bert.) - implemented - how to match records using special values;
c. Some kind of conditional logic if it is not too complicated, i.e. If Forename is missing, then check whether Initials are the same - implemented - see special values for matching.

2. Web Services (WS)
a. Basic functionality to upload Flat File Schema to the server.
b. Basic functionality to upload data files against Schema that is already on the server.
c. Run validation of the file set against the schema.

Thanks to Chris (RacerX64), who already spent his weekend (man, I hope you got bonus for that) to write the most of needed functionality, so it should take couple of days to accomplish this task. The biggest challenge with WS is to set up public server which could be available to everybody. If somebody can help with IIS hosting, it will be highly appreciated. IF WS is UP and running I will think of basic web interface for file upload and error reporting.

3. Data Error Log & Execution
a. Make HTML or XML+XSLT version of report that will be user friendly enough to send reports to data providers. - implemented. Now the output format of the log depends on its extension and can be Text, Xml or Htlm (see User Guide->Create New Schema->Log Formats)
b. Replace current execution logging with standard Tracing mechanism which is more flexible and robust way of logging.

4. Data File Preview
a. Implemented - Add a form to GUI that will show data from the file in the table with errors being highlighted.
b. Add a form for managing fields setting of the fixed position file.

5. Documentation
a. I want to make it easy to use and read. Please help me with this!
b. I have started shooting short videos to build up a tutorial. I think there is space for about 6 more movies. Please leave your feedback or produce your own video.
c. I need to make a sample schema and data set that everybody could see and play with.
d. If you use Flat File Checker extensively, please write an article or a business case and share it with other users. You can post them on the forum, in your blog, or here.

6. XSLT Schema
a. I don't know whether somebody uses IE to view the schema, but I find it quite handy. I want to change the way schema is presented slightly. Your thoughts on ways to improve transformation will be appreciated.

7. Data Rule Templates
a. The last, but not the least I need to create a GUI to create Query rules template files. This is a very powerful peace of functionality and I use it for data QA of marketing selections. Though it is possible to write template file from scratch or use Custom Query Xml as a base, it is not the easiest task and not something I like to do. Feel free to ask about Template Queries, while GUI functionality for them is under development consideration.
b. I’m also thinking about generic rule templates that will cover other then Query Rule templates.

Please post your wishes or vote here. You feedback is much appreciated!


UPD - June 2010:
I have done almost everything I wanted on duplicate records matching (releases after v.0.7.0.9), so will start working on improving of error logs and execution logs. Any requests are welcome.
UPD2 - June 2010:
Logs are now available in Html and Xml formats (releases after v.0.7.1.0)
UPD3 - Jan 2011
Now data preview is available which allows to edit original data files directly in Flat File Checker (see details here).

Thursday, February 11, 2010

Video Tutorials for Flat File Checker

I am not sure whether this will be very helpful but I decided to make a number of videos that show how to use Flat File Checker.

Firstly, I want to thank guys who made and support CamStudio which was used to create these videos.


Secondly, I want you to see the amazing microphone that was used to capture my voice:



I hope you are impressed and ready to watch the videos:
  • How to create a new schema and add file:

  • How to add lookup and link files

  • How to create data rule

  • How to add regular expressions
I hope this will be helpful to start worknig with Flat File Checker. I am planning to create onther 4-6 video tutorials on other topics. If you have suggestions for these topics please let me know.

Sunday, February 7, 2010

Batch Data Validation with Flat File Checker

Title: Batch Data Validation with Flat File Checker
Author:      Bulat Yapparov, Chris Ellison
Email:       byapparov@gmail.com
Member ID:   5425273
Language:    VB.Net, VBScript
Platform:    Windows
Technology:  VBScript, Flat File Checker, Scheduled Tasks
Level:       Intermediate
Description: This is a real life example of data validation using conslole version of Flat File Checker and VBScript
Section      Database
SubSection   Database
License:     CPOL (default - select your desired license from our list)

Introduction

In my previous article I have mentioned that data validation with Flat File Checker can be done through command line.

Chris Ellison wrote a brilliant VBScript based solution (download) that allows validating multiple data files using different Flat File Checker schemas depending on file name prefix.
This script can be used as a source of inspiration if you want to automate your data validation process.


Background

Sometimes it is required to validate many similar data files, which is not possible through Flat File Checkers GUI.
However, there is an easy solution for that with console version (FlaFi.exe).

Main console tokens are parameters that you can pass to console application:
-silent - executes schema without echo to console window and closes console when validation is complete.
-errors - echoes data errors to console window.

Custom tokens are user specified tokens that can be used to pass values to control schema's attributes. Custom tokens allow parameterising the schema and reusing it for similar data files. You can pass values through command line and change parameters of validation i.e.: file name suffix, product code, dates, etc.
All you need to create a custom token is to:
1) add Global Variables to you schema.
2) call FlaFi with command that has the following format:

C:\Program Files\Flat File Checker\FlaFi {Schema Path} -{Variable Name} {Variable Value}
3) add global variable named "campaign" to the schema.
4) use "Constant" function in VB Expressions for any attribute in the schema:
FileName=[Constant('campaign')]-import.csv
5) run validation:
C:\Program Files\Flat File Checker\FlaFi {Schema Path} 
                                                   -campaign SRP54A
Let's have a look at the most important function of the script - ValidateFile

Using the code

Download the solution and see instructions file
for more details on how to try the code.
ValidateFile function has following parameters:
strFolder - Folder that contains schema files
strSchema - Schema file name
strFileDir - Directory that contains files
strFile - File Name
strLogFileDir- Folder that contains logs

Private Function ValidateFile (strFolder, strSchema, strFileDir, _
                               strFile, strLogFileDir)
 'Execute FFC and return 1 of 4 return values
 'FFC ERRORLEVEL 0 = "VALID" File Valid
 'FFC ERRORLEVEL 1 = "DATA" Data Exception
 'FFC ERRORLEVEL 2 = "EXEC" Execution Exception
 'FFC ERRORLEVEL > 2 = "SEVR" Severe Error
 dim sh
 dim strCommand
 dim return

 ValidateFile = "NO_SCHEMA"

 if SchemaFileExists(strSchemaFolder, strSchema) then
 Set sh = CreateObject("WScript.Shell")
 WriteLogEntry objLogFile, "Validating File " & strFile & _
               " against FFC Schema " & strSchema
 strCommand = Chr(34) & strFlatFileExe & Chr(34) & " -silent " & _
   strFolder & strSchema  & " -InFileName " & strFile & " -InFileDir " & _
   strFileDir & " -InLogDir " & strLogFileDir
 WriteLogEntry objLogFile, "Running Command " & strCommand
 'wscript.echo "Running Command " & strCommand

 return = sh.Run(strCommand, 0, True)

 if return = 0 then
       WriteLogEntry objLogFile, _
         "File Successfully Validated Return Code = " & return
    ValidateFile = "VALID"
 else
  if return = 1 then
   WriteLogEntry objLogFile,"Data File Validation Failed, Return Code " _
    & return
   ValidateFile = "DATA"
  else
   if return = 2 then
    WriteLogEntry objLogFile,"Execution error, Return Code " & return
    ValidateFile = "EXEC"
   else
    WriteLogEntry objLogFile,"Serious, Return Code " & return
    ValidateFile = "SEVR"
   end if
  end if
 end if
 set sh = nothing

 else
 ValidateFile = "NO_SCHEMA"
end if
end function


Points of Interest

It will be useful to transfer this solution to VBA which provides better environment for debugging and logging.
See forum for more information about Flat File Checker command line execution:

Eval3 wrapper

Title:       Eval3  wrapper
Author:      Bulat Yapparov, Chris Ellison
Email:       byapparov@gmail.com
Member ID:   5425273
Language:    Vb.Net
Platform:    Windows
Technology:  dotNet
Level:       Intermediate
Description: Evaluator for multiple VB expressions based on Eval3 library
Section      Type the Code Project Section you Wish the Article to Appear
SubSection   Type the Code Project SubSection you Wish the Article to Appear
License:     CPOL (default - select your desired license from our list)

Introduction

Eval3 library writen by Pascal Ganaye allows to parse and evaluate VB code, which is great but often it is not enough.
What I needed was a class that would interprit a string that contains multiple expressions inclosed in square brakets.
For example this string:
GIS_[Format(Now, 'yy-MM-dd')]-[Field('ID')]-[Field('Source')]
Will be evaluated to something like this:
GIS_10-01-13-5324523-DFS235F

Background

Lets assume we have an exprsion similar to the shown example above which we need to evaluate multiple times.
I have writen a class that will parse the whole expression once and store "formulas" for future evaluations.
This sighnificantly reduces execution time while keeping it really simple

Using the code

Download Code: Eval3 Wrapper; Evalutation functions

It is very easy to use EvalFormulaCollection class once you have Eval3 added to your project.
You just need to:
  • Create new instance of EvalFormulaCollection class with expression that you need to evaluate
  • Add "environment" object that provides functions that can be used in expression
  • Call Initiate() method that will make reusable array of parsed "formulas".
  • Call Value() property to get the interpreted value of the expression which is evaluated without parsing!
Code example:
' Test evaluation of the expression with multiple VB Expressions
Sub Test()
 Dim c As New MyClass
 dim e as New evaluationFunctions
 Dim expression As String = _ 
     "GIS_[Format(Now, 'yy-MM-dd')]-[Field('ID')]-[Field('Source')]"
 Dim ev as New EvalFormulaCollection(expression)
 ev.AddEnvironment(c)
 ev.AddEnvironment(e)
 ev.Initiate()
 MsgBox ev.Value()
End Sub

Points of Interest

The best thing about this code is its size. It is quite impressive what you can do with Regex class from RegularExpressions library with one line of code:
' Find all expression in square brackets
Public Sub New(ByVal expression As String)
     ...
    _matches = Regex.Matches(_expression, "[\[][^\[\]]{1,}[\]]")
End Sub

Friday, February 5, 2010

Version 0.7.0 is released

New Functionality

1. Option to overwrite schema attributes in command line
Now you can parameterize you schema and control any of its attributes through command line parameters.

I you want to pass value to the schema through command line you need to do the following:
1) Add Global Variable (in Options) to your schema to define a part of the file name i.e.: filesufix
2) Change file name in the schema to something like this:
Name="MyFile-[Constant('filesufix')].csv"
3) Call FlaFi in command line like this:
C:\Program Files\Flat File Checker\FlaFi "SchemaPath.xml" -filesufix 20100124 

Does any other application have something cool like that?

2. Drag and Drop Rules
Now you can drag & drop rules in the rule builder which is an amasing experience!

3. StdOut Show Errors

Now you can use -errors token in the console version of the application to echo data errors.

Bug Fixes
  • Command line execution not returning errorlevel
  • "Has Header" checkbox not preserved
  • Global Variable without value - unhandled exception
  • Start and End positions are not saved.
  • Start Column default value
  • Custom Query Screen - improved interface
  • Mid function in the VB Expression
  • Constant not working in data and error file specifications
See project's issue tracker for details.

Join project group to dowload this version!