Creating a Frequency Count from a Dezign For Databases ERD using ApplyXSL
By Jeffrey McArthur
JSM Software is dedicated to Total Quality Management (TQM). This includes proper database documentation. One useful tool in the analysis of a database structure is a frequency count. That is, a table showing the frequency of every field in the database. This shows what fields are populated and what fields are unused. Writing the SQL to generate a frequency count can be tedious. This paper discusses a method to generate the frequency script programmatically using Dezign for Databases and ApplyXSL.
Datanamic (www.datanamic.com) created a product named Dezign for Databases. The tool is inexpensive database development tool that allows the creation and modifications of entity relationship diagrams. One significant advantage of Dezign for Databases is the data is stored as an XML file.
A Dezign for Databases file has the extension of ".dez". The top-level structure of the file looks like this:
- <dezign>
- <VERSION>
- Various header information
- <DIAGRAMS>
- <DIAGRAM>
- Information about a diagram
- </DIAGRAM>
- Additional diagrams
- <DIAGRAM>
- </DIAGRAMS>
- <DOMAINS>
- Information about domains
- </DOMAINS>
- <ENTITIES>
- <ENT>
- <NAME> Name of the entity
- <ID> Id number of the entity
- <DESC> Description of the entity
- Other entity information
- <ATTRIBUTES>
- <ATTR>
- <NAME> Name of the attribute
- <ID> Id number of the attribute
- <DESC> Description of the attribute
- <TODO> To Do list for attribute
- <DOMAINID> Id of the domain
- <DT>
- <DTLISTNAME> Attribute type
- Other attribute properties
- </DT>
- </ATTRIBUTES>
- Other entity information including indexes
- </ENT>
- Additional entities
- <ENT>
- </ENTITIES>
- Other diagram information like sub-categories, text areas, stamps, procedures, and relationships
- </VERSION>
- <VERSION>
- </dezign>
Every XSL transformation specifies the namespace and the output type. Since the script is generating an SQL script from an XML input file, the output type is text, and we do not want the XML declaration. The lines below are the start of the XSL transformation:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"
omit-xml-declaration="yes"
encoding="UTF-8"
indent="yes"/>
The logic to create the frequency count is simple. The code will process each entity. For each entity, the transformation generates a select query for each attribute. Dezign for Databases wraps all the entities in an <ENTITIES> tag. The XSL transformation only needs to select the <ENTITIES> tag.
<xsl:template match="/">
<xsl:apply-templates select="//ENTITIES"/>
</xsl:template>
Another template handles the <ENTITES> tag:
<xsl:template match="ENTITIES">
<xsl:apply-templates select="ENT"/>
</xsl:template>
Processing each entity is a bit more complicated. The transformation generates a "Union" statement following each entity, except for the last one.
<xsl:template match="ENT">
<xsl:apply-templates select="ATTRIBUTES"/>
<xsl:if test="position()!=last()"><xsl:text>union
</xsl:text></xsl:if>
</xsl:template>
The <ATTRIBUTES> wrapper element surrounds the <ATTR> elements. Both the <ENT> element and the <ATTR> element use the <NAME> element. There are several ways to handle this. The paper uses the "mode" attribute on the XSL template to specify the proper way to handle the <NAME> element. The select statement created by the transformation should look something like this:
select
'entity name' as TableName,
'attribute name' as FieldName,
count(attribute name) as RecCount,
count(distinct attribute name) as DistinctRecCount
from
entity name
Two other issues need addressing. The first is the issue of "memo" and "image" fields. Microsoft uses the type of "TEXT" and "NTEXT" to refer to memo fields. Microsoft SQL does not support aggregate functions on fields of type TEXT, NTEXT, or IMAGE. The script must skip fields of that type. The code for the attribute looks into the <DT> element and then the <DTLISTNAME> to check the field type.
<xsl:template match="ATTRIBUTES">
<xsl:apply-templates select="ATTR[
(child::DT/child::DTLISTNAME !='text')
and
(child::DT/child::DTLISTNAME !='ntext')
and
(child::DT/child::DTLISTNAME !='image')]"/>
</xsl:template>
The processing of the <ATTR> element just calls the template for the <NAME> element and checks to see if the SQL needs a "union" statement. Two different contexts use the <NAME> element. The transformation must distinguish between the two contexts by means of the mode attribute on the apply-templates statement. For the <ATTR> element, the apply-templates statement for the <NAME> element sets the mode to "attr".
<xsl:template match="ATTR">
<xsl:apply-templates mode="attr" select="NAME"/>
<xsl:if test="position()!=last()"><xsl:text>union
</xsl:text></xsl:if>
</xsl:template>
The template with the mode of "justname" outputs just the value of the name element without any additional processing.
<xsl:template mode="justname" match="NAME">
<xsl:value-of select="."/>
</xsl:template>
The template for the <NAME> element with a mode of "attr" strings together all the pieces of the select statement. Microsoft SQL uses square brackets, [ and ] to handle field names with spaces and special characters. The template wraps the attribute and entity names in square brackets to avoid any problems.
<xsl:template mode="attr" match="NAME">
<xsl:text>select '</xsl:text>
<xsl:apply-templates
mode="justname"
select="parent::ATTR/parent::ATTRIBUTES/parent::ENT/NAME"/>
<xsl:text>' as TableName, '</xsl:text>
<xsl:value-of select="."/>
<xsl:text>' as FieldName, count([</xsl:text>
<xsl:value-of select="."/>
<xsl:text>]) as RecCount, count(distinct [</xsl:text>
<xsl:value-of select="."/>
<xsl:text>]) as DistinctRecCount from [</xsl:text>
<xsl:apply-templates
mode="justname"
select="parent::ATTR/parent::ATTRIBUTES/parent::ENT/NAME"/>
<xsl:text>] where </xsl:text>
<xsl:text>([</xsl:text>
<xsl:value-of select="."/>
<xsl:text>] is not null)</xsl:text>
<xsl:text>
</xsl:text>
</xsl:template>
Datanamic created several reverse-engineering tools. Importer ER Scripts reads in a SQL script generated by Microsoft SQL and generates the ERD. Microsoft SQL 2000 ships with a sample database named Northwind. Importer ER Scripts generated the ERD below:

The free tool ApplyXSL (http://www.jsm-software.com/downloads/software/software.htm) simplifies the process of applying the XSL transformation. Below is a screen capture of the program:

The "..." buttons open file-find dialogs. The process of generating the SQL requires selecting the appropriate files. Note: Dezign for Database ".dez" files do not have a schema, therefore the validate option must be unchecked. The screen capture shows the two input files and one output file:

Pressing the "Apply Transform" button generates the SQL script shown on the Output tab of the program:

Using the XSL transformation described in this paper on the Northwind ERD generated the following frequency table:
|
TableName |
FieldName |
RecCount |
DistinctRecCount |
|
Categories |
CategoryID |
8 |
8 |
|
Categories |
CategoryName |
8 |
8 |
|
CustomerCustomerDemo |
CustomerID |
0 |
0 |
|
CustomerCustomerDemo |
CustomerTypeID |
0 |
0 |
|
CustomerDemographics |
CustomerTypeID |
0 |
0 |
|
Customers |
Address |
91 |
91 |
|
Customers |
City |
91 |
69 |
|
Customers |
CompanyName |
91 |
91 |
|
Customers |
ContactName |
91 |
91 |
|
Customers |
ContactTitle |
91 |
12 |
|
Customers |
Country |
91 |
21 |
|
Customers |
CustomerID |
91 |
91 |
|
Customers |
Fax |
69 |
69 |
|
Customers |
Phone |
91 |
91 |
|
Customers |
PostalCode |
90 |
86 |
|
Customers |
Region |
31 |
18 |
|
Employees |
Address |
9 |
9 |
|
Employees |
BirthDate |
9 |
9 |
|
Employees |
City |
9 |
5 |
|
Employees |
Country |
9 |
2 |
|
Employees |
EmployeeID |
9 |
9 |
|
Employees |
Extension |
9 |
9 |
|
Employees |
FirstName |
9 |
9 |
|
Employees |
HireDate |
9 |
8 |
|
Employees |
HomePhone |
9 |
9 |
|
Employees |
LastName |
9 |
9 |
|
Employees |
PhotoPath |
9 |
5 |
|
Employees |
PostalCode |
9 |
9 |
|
Employees |
Region |
5 |
1 |
|
Employees |
ReportsTo |
8 |
2 |
|
Employees |
Title |
9 |
4 |
|
Employees |
TitleOfCourtesy |
9 |
4 |
|
EmployeeTerritories |
EmployeeID |
49 |
9 |
|
EmployeeTerritories |
TerritoryID |
49 |
49 |
|
Order Details |
Discount |
2155 |
11 |
|
Order Details |
OrderID |
2155 |
830 |
|
Order Details |
ProductID |
2155 |
77 |
|
Order Details |
Quantity |
2155 |
55 |
|
Order Details |
UnitPrice |
2155 |
116 |
|
Orders |
CustomerID |
830 |
89 |
|
Orders |
EmployeeID |
830 |
9 |
|
Orders |
Freight |
830 |
799 |
|
Orders |
OrderDate |
830 |
480 |
|
Orders |
OrderID |
830 |
830 |
|
Orders |
RequiredDate |
830 |
454 |
|
Orders |
ShipAddress |
830 |
89 |
|
Orders |
ShipCity |
830 |
70 |
|
Orders |
ShipCountry |
830 |
21 |
|
Orders |
ShipName |
830 |
90 |
|
Orders |
ShippedDate |
809 |
387 |
|
Orders |
ShipPostalCode |
811 |
84 |
|
Orders |
ShipRegion |
323 |
19 |
|
Orders |
ShipVia |
830 |
3 |
|
Products |
CategoryID |
77 |
8 |
|
Products |
Discontinued |
77 |
2 |
|
Products |
ProductID |
77 |
77 |
|
Products |
ProductName |
77 |
77 |
|
Products |
QuantityPerUnit |
77 |
70 |
|
Products |
ReorderLevel |
77 |
7 |
|
Products |
SupplierID |
77 |
29 |
|
Products |
UnitPrice |
77 |
62 |
|
Products |
UnitsInStock |
77 |
51 |
|
Products |
UnitsOnOrder |
77 |
10 |
|
Region |
RegionDescription |
4 |
4 |
|
Region |
RegionID |
4 |
4 |
|
Shippers |
CompanyName |
3 |
3 |
|
Shippers |
Phone |
3 |
3 |
|
Shippers |
ShipperID |
3 |
3 |
|
Suppliers |
Address |
29 |
29 |
|
Suppliers |
City |
29 |
29 |
|
Suppliers |
CompanyName |
29 |
29 |
|
Suppliers |
ContactName |
29 |
29 |
|
Suppliers |
ContactTitle |
29 |
15 |
|
Suppliers |
Country |
29 |
16 |
|
Suppliers |
Fax |
13 |
13 |
|
Suppliers |
Phone |
29 |
29 |
|
Suppliers |
PostalCode |
29 |
29 |
|
Suppliers |
Region |
9 |
8 |
|
Suppliers |
SupplierID |
29 |
29 |
|
Territories |
RegionID |
53 |
4 |
|
Territories |
TerritoryDescription |
53 |
52 |
|
Territories |
TerritoryID |
53 |
53 |
XSL transformations automate the tedious process of generating a frequency count of a database. Microsoft SQL Server 2000 is limited in the number of unions allowed. For a database with a large number of fields, it is possible to exceed the capacity of Microsoft SQL Server. However, for most situations, this process works quickly.
The author, Jeffrey McArthur, is President of JSM Software, a company focused on affordable business intelligence solution using Microsoft SQL Server. You can reach Jeffrey at jeffmcarthur@jsm-software.com. For more information on JSM Software, please see http://www.jsm-software.com or contact the sales department at 1-410-290-6958.