Creating a Frequency Count from a Dezign For Databases ERD using ApplyXSL

By Jeffrey McArthur

JSM Software is dedicated to Total Quality Management (TQM). This includes proper database documentation. One useful tool in the analysis of a database structure is a frequency count. That is, a table showing the frequency of every field in the database. This shows what fields are populated and what fields are unused. Writing the SQL to generate a frequency count can be tedious. This paper discusses a method to generate the frequency script programmatically using Dezign for Databases and ApplyXSL.

Datanamic (www.datanamic.com) created a product named Dezign for Databases. The tool is inexpensive database development tool that allows the creation and modifications of entity relationship diagrams. One significant advantage of Dezign for Databases is the data is stored as an XML file.

A Dezign for Databases file has the extension of ".dez". The top-level structure of the file looks like this:

  • <dezign>
    • <VERSION>
      • Various header information
      • <DIAGRAMS>
        • <DIAGRAM>
          • Information about a diagram
        • </DIAGRAM>
        • Additional diagrams
      • </DIAGRAMS>
      • <DOMAINS>
        • Information about domains
      • </DOMAINS>
      • <ENTITIES>
        • <ENT>
          • <NAME> Name of the entity
          • <ID> Id number of the entity
          • <DESC> Description of the entity
          • Other entity information
          • <ATTRIBUTES>
            • <ATTR>
            • <NAME> Name of the attribute
            • <ID> Id number of the attribute
            • <DESC> Description of the attribute
            • <TODO> To Do list for attribute
            • <DOMAINID> Id of the domain
            • <DT>
              • <DTLISTNAME> Attribute type
              • Other attribute properties
            • </DT>
          • </ATTRIBUTES>
          • Other entity information including indexes
        • </ENT>
        • Additional entities
      • </ENTITIES>
      • Other diagram information like sub-categories, text areas, stamps, procedures, and relationships
    • </VERSION>
  • </dezign>

Every XSL transformation specifies the namespace and the output type. Since the script is generating an SQL script from an XML input file, the output type is text, and we do not want the XML declaration. The lines below are the start of the XSL transformation:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"
      omit-xml-declaration="yes"
      encoding="UTF-8"
      indent="yes"/>

The logic to create the frequency count is simple. The code will process each entity. For each entity, the transformation generates a select query for each attribute. Dezign for Databases wraps all the entities in an <ENTITIES> tag. The XSL transformation only needs to select the <ENTITIES> tag.

<xsl:template match="/">
      <xsl:apply-templates select="//ENTITIES"/>
</xsl:template>

Another template handles the <ENTITES> tag:

<xsl:template match="ENTITIES">
      <xsl:apply-templates select="ENT"/>
</xsl:template>

Processing each entity is a bit more complicated. The transformation generates a "Union" statement following each entity, except for the last one.

<xsl:template match="ENT">
      <xsl:apply-templates select="ATTRIBUTES"/>
      <xsl:if test="position()!=last()"><xsl:text>union
</xsl:text></xsl:if>
</xsl:template>

The <ATTRIBUTES> wrapper element surrounds the <ATTR> elements. Both the <ENT> element and the <ATTR> element use the <NAME> element. There are several ways to handle this. The paper uses the "mode" attribute on the XSL template to specify the proper way to handle the <NAME> element. The select statement created by the transformation should look something like this:

select
'entity name' as TableName,
'attribute name' as FieldName,
count(attribute name) as RecCount,
count(distinct attribute name) as DistinctRecCount
from
entity name

Two other issues need addressing. The first is the issue of "memo" and "image" fields. Microsoft uses the type of "TEXT" and "NTEXT" to refer to memo fields. Microsoft SQL does not support aggregate functions on fields of type TEXT, NTEXT, or IMAGE. The script must skip fields of that type. The code for the attribute looks into the <DT> element and then the <DTLISTNAME> to check the field type.

<xsl:template match="ATTRIBUTES">
      <xsl:apply-templates select="ATTR[
            (child::DT/child::DTLISTNAME !='text')
            and
            (child::DT/child::DTLISTNAME !='ntext')
            and
            (child::DT/child::DTLISTNAME !='image')]"/>
</xsl:template>

The processing of the <ATTR> element just calls the template for the <NAME> element and checks to see if the SQL needs a "union" statement. Two different contexts use the <NAME> element. The transformation must distinguish between the two contexts by means of the mode attribute on the apply-templates statement. For the <ATTR> element, the apply-templates statement for the <NAME> element sets the mode to "attr".

<xsl:template match="ATTR">
      <xsl:apply-templates mode="attr" select="NAME"/>
      <xsl:if test="position()!=last()"><xsl:text>union
</xsl:text></xsl:if>
</xsl:template>

The template with the mode of "justname" outputs just the value of the name element without any additional processing.

<xsl:template mode="justname" match="NAME">
      <xsl:value-of select="."/>
</xsl:template>

The template for the <NAME> element with a mode of "attr" strings together all the pieces of the select statement. Microsoft SQL uses square brackets, [ and ] to handle field names with spaces and special characters. The template wraps the attribute and entity names in square brackets to avoid any problems.

<xsl:template mode="attr" match="NAME">
      <xsl:text>select '</xsl:text>
      <xsl:apply-templates
            mode="justname"
            select="parent::ATTR/parent::ATTRIBUTES/parent::ENT/NAME"/>
      <xsl:text>' as TableName, '</xsl:text>
      <xsl:value-of select="."/>
      <xsl:text>' as FieldName, count([</xsl:text>
      <xsl:value-of select="."/>
      <xsl:text>]) as RecCount, count(distinct [</xsl:text>
      <xsl:value-of select="."/>
      <xsl:text>]) as DistinctRecCount from [</xsl:text>
      <xsl:apply-templates
            mode="justname"
            select="parent::ATTR/parent::ATTRIBUTES/parent::ENT/NAME"/>
      <xsl:text>] where </xsl:text>
      <xsl:text>([</xsl:text>
      <xsl:value-of select="."/>
      <xsl:text>] is not null)</xsl:text>
      <xsl:text>
</xsl:text>
</xsl:template>

Datanamic created several reverse-engineering tools. Importer ER Scripts reads in a SQL script generated by Microsoft SQL and generates the ERD. Microsoft SQL 2000 ships with a sample database named Northwind. Importer ER Scripts generated the ERD below:

Northwind ERD

The free tool ApplyXSL (http://www.jsm-software.com/downloads/software/software.htm) simplifies the process of applying the XSL transformation. Below is a screen capture of the program:

ApplyXSL Start Screen

The "..." buttons open file-find dialogs. The process of generating the SQL requires selecting the appropriate files. Note: Dezign for Database ".dez" files do not have a schema, therefore the validate option must be unchecked. The screen capture shows the two input files and one output file:

ApplyXSL Input and Output Specified

Pressing the "Apply Transform" button generates the SQL script shown on the Output tab of the program:

ApplyXSL Output Generated

Using the XSL transformation described in this paper on the Northwind ERD generated the following frequency table:

TableName

FieldName

RecCount

DistinctRecCount

Categories

CategoryID

8

8

Categories

CategoryName

8

8

CustomerCustomerDemo

CustomerID

0

0

CustomerCustomerDemo

CustomerTypeID

0

0

CustomerDemographics

CustomerTypeID

0

0

Customers

Address

91

91

Customers

City

91

69

Customers

CompanyName

91

91

Customers

ContactName

91

91

Customers

ContactTitle

91

12

Customers

Country

91

21

Customers

CustomerID

91

91

Customers

Fax

69

69

Customers

Phone

91

91

Customers

PostalCode

90

86

Customers

Region

31

18

Employees

Address

9

9

Employees

BirthDate

9

9

Employees

City

9

5

Employees

Country

9

2

Employees

EmployeeID

9

9

Employees

Extension

9

9

Employees

FirstName

9

9

Employees

HireDate

9

8

Employees

HomePhone

9

9

Employees

LastName

9

9

Employees

PhotoPath

9

5

Employees

PostalCode

9

9

Employees

Region

5

1

Employees

ReportsTo

8

2

Employees

Title

9

4

Employees

TitleOfCourtesy

9

4

EmployeeTerritories

EmployeeID

49

9

EmployeeTerritories

TerritoryID

49

49

Order Details

Discount

2155

11

Order Details

OrderID

2155

830

Order Details

ProductID

2155

77

Order Details

Quantity

2155

55

Order Details

UnitPrice

2155

116

Orders

CustomerID

830

89

Orders

EmployeeID

830

9

Orders

Freight

830

799

Orders

OrderDate

830

480

Orders

OrderID

830

830

Orders

RequiredDate

830

454

Orders

ShipAddress

830

89

Orders

ShipCity

830

70

Orders

ShipCountry

830

21

Orders

ShipName

830

90

Orders

ShippedDate

809

387

Orders

ShipPostalCode

811

84

Orders

ShipRegion

323

19

Orders

ShipVia

830

3

Products

CategoryID

77

8

Products

Discontinued

77

2

Products

ProductID

77

77

Products

ProductName

77

77

Products

QuantityPerUnit

77

70

Products

ReorderLevel

77

7

Products

SupplierID

77

29

Products

UnitPrice

77

62

Products

UnitsInStock

77

51

Products

UnitsOnOrder

77

10

Region

RegionDescription

4

4

Region

RegionID

4

4

Shippers

CompanyName

3

3

Shippers

Phone

3

3

Shippers

ShipperID

3

3

Suppliers

Address

29

29

Suppliers

City

29

29

Suppliers

CompanyName

29

29

Suppliers

ContactName

29

29

Suppliers

ContactTitle

29

15

Suppliers

Country

29

16

Suppliers

Fax

13

13

Suppliers

Phone

29

29

Suppliers

PostalCode

29

29

Suppliers

Region

9

8

Suppliers

SupplierID

29

29

Territories

RegionID

53

4

Territories

TerritoryDescription

53

52

Territories

TerritoryID

53

53

XSL transformations automate the tedious process of generating a frequency count of a database. Microsoft SQL Server 2000 is limited in the number of unions allowed. For a database with a large number of fields, it is possible to exceed the capacity of Microsoft SQL Server. However, for most situations, this process works quickly.

The author, Jeffrey McArthur, is President of JSM Software, a company focused on affordable business intelligence solution using Microsoft SQL Server. You can reach Jeffrey at jeffmcarthur@jsm-software.com. For more information on JSM Software, please see http://www.jsm-software.com or contact the sales department at 1-410-290-6958.

White Papers Calculating Deciles Valid PHP Email Form Database Frequency Time Management Using Excel Y2K Problems Still Haunt Us
Windows Mobile 6.5