Import and compare XML and CSV files with Firebird DB
XMLwizard is a utility which can be used to load XML, CSV and flat textual files into Firebird databases. CSV is acronym for Comma Separated Values, but various forms of these files are supported (for example, TSV or Tab Separated Values), with user configurable delimiters, qualifiers and record separators. It supports multilevel XML files with unlimited depth of nesting.
Beside importing, XMLwizard has a powerful data comparer which can
be used to compare the data in input file against the data in the
database and show the difference in a grid. In this grid, user can
cherry-pick individual rows or even fields to be inserted, updated or
deleted from the database.
If everything worked the way you wanted: Congratulations!
If not, read on...
Main screen of XMLwizard has three sections. The main section on
the left contains a tree representing structure of the input file.
For each node, you can invoke the context menu do perform some
actions. When a node in the tree is selected, its detailed properties
are shown in the properties grid on the right-hand side. Some of the
values are read-only, while others can be set by typing directly in
the grid or selecting the value from a drop down list (fields marked
with yellow color). The third section contains a tree representing
structure of tables and columns in the target Firebird database.
Once you start the program, a wizard is shown which will guide you
through selection of source input file and target database and
configuration of various options. If you cancel the wizard, you can
restart it at any time by selecting the New project option.
In the Server field you should enter the hostname or IP address of the Firebird server. If you are using Firebird embedded, leave the field blank.
In the Database path field you should enter the path to the Firebird database on the server, or an alias if you have set it up on Firebird server.
In fields Username and Password you should enter the user credentials.
In field Connection character set you can select some of the character sets offered in the dropdown list, or enter a completely new one. The reason for this is that some newer Firebird versions might support more character sets, and you wouldn't have to upgrade XMLwizard to a newer version just to have that working.
It is recommended that you use UTF8 character set for all
connections. The reason is that this character set simply works
out-of-the-box in most of the cases, because all other character sets
can be converted to it.
On this page you need to select the type of the input file and the file itself.
Please note that memory requirements for XML files are 10 times the size of the file, because the entire file with its structure and data is parsed and loaded by a DOM parser. This means that processing at later stage works much faster. For example, if you have some XML file of 20MB, XMLwizard's parser will require about 200MB of RAM to work properly.
If you are unsure which is the character set of the input file,
try with ISO8859_1 or UTF8. UTF8 is a superset of ASCII so it is
quite possible it will work. When loading XML files, the character
set option is disabled because character set information is stored in
XML files and the parser automatically detects it.
When the option Treat empty strings as NULL is checked XMLwizard will treat all the empty fields in the input as NULL. This setting is only relevant to fields of character types (char, varchar). With dates, times and numbers it is always on. For example, empty number field always defaults to NULL and not to zero.
Option Trim input data is also only relevant for character fields. For example, when strings are converted to numbers, the leading and trailing whitespace is ignored.
When option Automatically map tables and columns after loading
is checked, XMLwizard will look for node names in the input file
and try to automatically match those to the names of columns and
tables in the Firebird database. It uses table or column name for
matching, and you can control whether this matching will be case
sensitive or not. If you turn this option off, you can always invoke
this action using the option Map all automatically from the
Input menu in the main XMLwizard's window.
You can also setup the default formats for date, time and timestamp columns. These formats will be used while analyzing the input file to determine whether some column is a date/time/timestamp column or not. If you, for example, have different formats for the date fields in the same input file, you can later modify the settings for each column separately in the properties section of the main XMLwizard screen. The date and time format uses placeholders to mark the place where a value should be found. Those placeholders are prefixed with % sign, and have the following meanings:
For example, to parse ISO date representation format (31st of December 2009 is represented as: 2009-12-31), we would use %Y-%m-%d format for dates.
The Alternative decimal separator setting is used when you
have (for example) numeric data with comma used as decimal separator
instead of dot (i.e. 123,45 instead of 123.45). This separator is
called Alternative because detection of values with dot as
decimal separator would still work properly even if you set the comma
as alternative separator.
You can setup which characters are used as delimiter, which characters are used as record qualifiers and which characters are used as record separators. For example, if the CSV file looks like this:
"XMLWizard is here";98.50;"Finally!"
Then semi-colon is the field delimiter and double quote is the text qualifier.
CSV import also shows you a read-only preview of the first few
lines of the input file, so that you can easily spot which settings
to use. Please note that the preview does not change when you change
the settings. It always remains the same, showing the input file as
Importing flat textual files assumes that you either have a file that has a fixed record length, or that records are separated by newlines. The Fixed record length setting is in charcters, to avoid potential problems with variable length character sets like UTF-8. On the other hand, the First record offset setting is an offset of the first record from the start of the input file, and it is given in bytes. The reason for this is that you might have some file with a binary data in header which you might want to skip. Since it cannot be converted into characters there is no way to count them. A usual use case for this is when importing DBF files directly.
In the upper part of the screen you should setup the offsets of
each field in a record. Again, this works on character basis, and
WYSIWYG principle (what you see on the screen is what you get
imported in the database later). Note that one of the rows can be
marked if you click on it. This has no significant effect on the
import process, it's only visual aid used to help you when you have
really long lines in input file.
To check whether the file was loaded properly, you can always
preview the program's view of the data by selecting the Show sample
data option in the Input menu. Please note that this dialog
shows first 20 values found for each column. Therefore, if some of the
fields in input file are empty or not supplied (common in XML files),
this preview would not show the actual rows that would be inserted in
On this screen you should select whether you want to do quick data import into the database or you wish to compare the data. The quick data import only gives a progress bar and data is imported into the database without any prompting (except in case of errors). Many times, it is smart to first do the comparison to have a preview of what's going to happen.
Quick data import has three options:
If both insert and update is selected then XMLwizard first tries to insert and if that fails it then tries to update without issuing an error. Therefore, selecting both options works like "overwrite" mode - adding new and updating the existing records in the database.
When error reporting is turned on, the import stops at every error
asking you what to do. However, you can decide to skip different
kinds of errors. It is always a good idea to start with error
reporting on, and turn some of the "expected" kinds of
errors off during the import. Of course, if you just want a quick and
dirty import for some testing purposes, it's easier just to ignore
all the errors and let it run.
When import is complete, the program will report number of records
that were successfully imported and ask you if you wish to commit the
transaction. So, you can still decide to abort if something has gone
If you want to Compare data and select records manually you
need to select whether you want to see the records that are the same,
different (same primary key, but different values in some other
columns) or not found in database table (but present in the input
All those options require a single scan of the input file. However,
the records are shown in the resulting grid in the order they are read
from the input file, so you might have a hard time finding, for
example, five new records in a pile of 10000 existing identical
It's better to select to see only the new records in this case.
You can also select the option to see which records exist in
database, but are not present in the input file. This requires a full
scan of the database table, which is then checked against cache of
primary keys found in the input file. If you have a lot of records in
the database, this can be very slow, so it is turned off by default.
Data comparer loads all the data from the input file and checks
against Firebird database. If record has all the fields with the same
values, it is identical. If record has same values for primary
key column(s), it is different. If primary key is not found in
the database table, the record is new. Additionally, comparer
can read in all the records from database table and check that against
the input file to see which records are not found in the input file.
That option is off by default because it can work really slow if you
have a lot of records in the database.
On data comparer screen you can select individual cells in the table
to update the value in database with the value from file. Those fields
are marked with yellow color. The value in top row (light-yellow) is
the value from input file, while the value in bottom row (dark-yellow)
is the value from database. White cells mark the values that are same
in both input file and database, but are a part of row that has
different values. When all values are updated into database, both rows
will be removed from the grid automatically. You can hold down the Ctrl
key to mark multiple disconnected cells for update.
You can also use the Insert and Delete buttons to insert new (red)
cells, and delete rows from database table that are not found in the
input file (blue).
When multiple cells (or entire rows or columns) are selected the
Insert button would only insert new rows in that selection, Update
button would only update the different values in that selection, and
Delete button would only delete records not found in the input file.
Using the context menu, you can also copy the selected cells, and
paste them into some other program like text editor or a spreadsheet
(OpenOffice Calc, Microsoft Excel, etc.)
Tip: If, for some fields, you want only to see the values in the compare grid, but do not wish to consider the row as different, you need to set the Include when comparing option to No in the Field properties grid on the main program screen:
Virtual columns enable you to add a virtual column to your input file. While this column is not really a part of the input file, the engine will consider it as such. This enables you to provide defaults for some table columns and also to compare table columns against some fixed value. Here are the meanings of the virtual column settings:
Virtual column name is the name that will be used to display
the column in the input file structure tree on the left side of the
main screen. You can use any name you like.
Database table column mapping maps this virtual column to
some database table's column. This option is only enabled when the
parent input node is already mapped to some table. You can also set or
change this mapping later, after you add the virtual column.
Type of Virtual data can be a specific value, NULL
When you create a virtual column having Specific value or NULL,
and you map it to the database table's column, that column will be set
to Specific value or NULL when records are imported.
Database column's values will also be compared against Specific
value or NULL when comparing data (unless you set Include
when comparing setting to No in field properties grid).
When you create a virtual column with data set up as None,
it will only be used when comparing. It would not compare database
column's data to anything, but rather just display it on the screen.
This is useful when you need some columns from database to have a
proper context while comparing, but those columns are not present in
the input file.
When working with nested XML files, it's a common requirement that
you need to use the value of some parent node's fields to insert in the
row in the database table mapped to the child node. This field is
usually a foreign key referencing the master table in master-detail
relationship, and it is called Master column in XMLwizard
terminology. Consider the following XML file structure and database
CREATE TABLE invoice_item (
invoice_number varchar(20) not null,
foreign key (invoice_number) references
In the above example, invoice number is needed in order to insert
records into invoice_item table. However, since it is a nested XML
file, that data is not found at the same level. XMLwizard enables you
to create a link to any value which is above in the hierarchy by using Map
master columns option in the Input menu. You can also use
the context menu of the input tree node to get the same option.
In the above example, we would add a master column referencing
invoice number, so the engine will see a structure of the tree like
<invoice_number references="invoice.number" />
XMLwizard supports unlimited depth of nesting, so you can reference
fields that are more than one level up in the hierarchy.