Wednesday, May 30, 2007

Using Scala Extractors with the XSD Schema Infoset Model

In this post I'm going to use Scala extractor objects with the Eclipse XML Schema Infoset Model to identify common ways of defining XML Schemas.

Hopefully I'm going to show that complex patterns on common Java objects can be identified using Scala extractors.

The Eclipse Schema Infoset Model is a complex EMF model the represents the W3C XML Schema. The Analyzing XML schemas with the Schema Infoset Model and Analyze Schemas with the XML Schema Infoset Model articles provide a nice explanation on how to work with this model.

For this post I wanted to create Scala patterns that identify common design patterns in Xml Schemas. There are four common patterns for XML Schemas: Russian Doll, Salami Slice, Venetian Blind and Garden of Eden.

The article Introducing Design Patterns in XML Schemas provide a nice explanation on each of this patterns. Also the article talks about a nice feature of NetBeans Enterprise Pack that allows the user to move a schema from one design pattern to another. More information on these design patterns can be found on the article Global vs Local from the xFront site.

The W3C XML Schema model is huge, but for this post I'm going to consider only a small subset.

The first step is the definition of the extractor objects that will be used to have access to certain properties of the Xml Schema Infoset model.


package langexplr.scalaextractorexperiments;

import org.eclipse.emf.ecore.resource._
import org.eclipse.emf.ecore.resource.impl._
import org.eclipse.xsd._
import org.eclipse.xsd.impl._
import org.eclipse.xsd.util._
import org.eclipse.emf.common.util.URI

object XSDSchemaParts {
def unapply(schema : XSDSchema) =
Some ((schema.getTargetNamespace(),
List.fromIterator(
new JavaIteratorWrapper[XSDTypeDefinition](
schema.getTypeDefinitions().iterator())),
List.fromIterator(
new JavaIteratorWrapper[XSDElementDeclaration](
schema.getElementDeclarations().iterator()))))

}

object XSDElementParts {
def unapply(elementDeclaration : XSDElementDeclaration) =
Some((elementDeclaration.getName(),elementDeclaration.getTypeDefinition()))


}

object XSDComplexType {
def unapply(typeDefinition : XSDTypeDefinition) =
if (typeDefinition.isInstanceOf[XSDComplexTypeDefinition]) {
val complexType = typeDefinition.asInstanceOf[XSDComplexTypeDefinition];
Some((complexType.getName(),complexType.getContent()))
} else {
None
}

}

object XSDSimpleType {
def unapply(typeDefinition : XSDTypeDefinition) = {
if (typeDefinition.isInstanceOf[XSDSimpleTypeDefinition]) {
Some(typeDefinition.asInstanceOf[XSDSimpleTypeDefinition])
} else {
None
}
}
}

object XSDParticleContent {
def unapply(p : XSDParticle) = Some(p.getContent())
}

object XSDSimpleSequenceModelGroup {
def unapply(complexTypeContent : XSDComplexTypeContent) = {
complexTypeContent match {
case XSDParticleContent(mg : XSDModelGroup)
if (mg.getCompositor().getName == "sequence") =>
Some(
List.fromIterator(
new JavaIteratorWrapper[XSDParticle](
mg.getContents.iterator())))
case _ => None
}
}



The XSDSchemaParts, XSDElementParts, XSDComplexType, XSDSimpleType, and XSDParticleContent extractor objects provide access to some properties of a model object. For example the XSDSchemaParts returns a tuple with the target namespace, the complex type definitions and the element definitions.

Also the XSDSimpleSequenceModelGroup provide an easy way to identify a common pattern that is the use of a XSD sequence as the type main element.

A class will be created for each design pattern. The following trait is the base for all of them:


trait XsdDesignPattern {
def name : String
def identify(schema:XSDSchema) : boolean
}



Now we can define each pattern:

Russian Doll

This design pattern says that the structure of the XML Schema is similar to the document structure. Only one public element is defined and all other elements are defined inside of it.

For example:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://langexplr.blogspot.com/DocsRussianDoll"
xmlns:p="http://langexplr.blogspot.com/DocsRussianDoll"
xmlns="http://langexplr.blogspot.com/DocsRussianDoll"
elementFormDefault="qualified">
<xs:element name="page">
<xs:complexType>
<xs:sequence>
<xs:element name="header">
<xs:complexType>
<xs:sequence>
<xs:element name="content" type="xs:string" />
</xs:sequence>
<xs:attribute name="margin"
type="xs:integer" />
</xs:complexType>

</xs:element>
<xs:element name="body">
<xs:complexType>
<xs:sequence>
<xs:element name="paragraph"
type="xs:string" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="footer">
<xs:complexType>
<xs:sequence>
<xs:element name="content" type="xs:string" />
</xs:sequence>
<xs:attribute name="margin"
type="xs:integer" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>


The Scala code to identify this design pattern looks like this:


class RussianDoll extends XsdDesignPattern {
def name = "Russian Doll"
def identify(schema : XSDSchema) =
schema match {
case XSDSchemaParts(
namespace,
List(),
List(XSDElementParts(
name,
XSDComplexType(
null,
XSDSimpleSequenceModelGroup(elements))))) => {
true
}
case _ => false
}
}




Salami Slice

This design pattern says that all elements must be declared at the top level with the type declaration inside of them.

For example:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://langexplr.blogspot.com/DocsSalamiSlice"
xmlns:tns="http://langexplr.blogspot.com/DocsSalamiSlice"
xmlns="http://langexplr.blogspot.com/DocsSalamiSlice"
elementFormDefault="qualified">

<xs:element name="content" type="xs:string" />
<xs:element name="paragraph" type="xs:string" />

<xs:element name="header">
<xs:complexType>
<xs:sequence>
<xs:element ref="tns:content" />
</xs:sequence>
<xs:attribute name="margin" type="xs:integer" />
</xs:complexType>
</xs:element>

<xs:element name="footer">
<xs:complexType>
<xs:sequence>
<xs:element ref="tns:content" />
</xs:sequence>
<xs:attribute name="margin" type="xs:integer" />
</xs:complexType>
</xs:element>

<xs:element name="body">
<xs:complexType>
<xs:sequence>
<xs:element ref="tns:paragraph" minOccurs="0"
maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:element name="page">
<xs:complexType>
<xs:sequence>
<xs:element ref="tns:header" />
<xs:element ref="tns:body" />
<xs:element ref="tns:footer" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>



The Scala code to identify this design pattern looks like this:


class SalamiSlice extends XsdDesignPattern {
def name = "Salami Slice"
def identify(schema : XSDSchema) =
schema match {
case XSDSchemaParts(
namespace,
List(),
elements) => {
elementsWithReferences(elements)
}
case _ => false
}
// Utility methods

def forAllInnerElements(l : List[XSDElementDeclaration],
pred : XSDElementDeclaration => boolean) =
l.forall{
case XSDElementParts(
_,
XSDComplexType(null,XSDSimpleSequenceModelGroup(particles))) =>
particles.forall({
case XSDParticleContent(e:XSDElementDeclaration) => pred(e)
case _ => false })
case XSDElementParts(_,XSDComplexType(null,null)) => true
case XSDElementParts(_,XSDSimpleType(_)) => true
case _ => false
}

def elementsWithReferences(x : List[XSDElementDeclaration]) =
forAllInnerElements(
x,
(e:XSDElementDeclaration) => e.isElementDeclarationReference)

}




Venetian Blind

This design pattern says that there one global element and all other elements use types declared at the top level.

For example:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://langexplr.blogspot.com/DocsVenetianBlind"
xmlns:tns="http://langexplr.blogspot.com/DocsVenetianBlind"
xmlns="http://langexplr.blogspot.com/DocsVenetianBlind"
elementFormDefault="qualified">

<xs:complexType name="sectionType">
<xs:sequence>
<xs:element name="content" type="xs:string" />
</xs:sequence>
<xs:attribute name="margin" type="xs:integer" />
</xs:complexType>

<xs:complexType name="bodyType">
<xs:sequence>
<xs:element name="paragraph" type="xs:string" minOccurs="0"
maxOccurs="unbounded" />
</xs:sequence>

</xs:complexType>

<xs:element name="page">
<xs:complexType>
<xs:sequence>
<xs:element name="header" type="tns:sectionType" />
<xs:element name="body" type="tns:bodyType" />
<xs:element name="footer" type="tns:sectionType" />
</xs:sequence>
</xs:complexType>
</xs:element>

</xs:schema>




The Scala code for this pattern looks like this:


class VenetianBlind extends XsdDesignPattern {
def name = "Venetian Blind"
def identify(schema : XSDSchema) =
schema match {
case XSDSchemaParts(
namespace,
types,
List(XSDElementParts(
_,
XSDComplexType(
_,
XSDSimpleSequenceModelGroup(elements))))) =>
elements.forall((e:XSDParticle) =>
elementWithTypeReferences(e,types))
case _ => false
}
def elementWithTypeReferences(e : XSDParticle, types : List[XSDTypeDefinition]) =
e match {
case XSDParticleContent(e:XSDElementDeclaration) =>
e.getTypeDefinition.getContainer.isInstanceOf[XSDSchema] &&
!(types.find ((t:XSDTypeDefinition) => t == e.getTypeDefinition)).isEmpty
case _ => false
}

}



Garden of Eden

This design pattern says that all the elements and types must be declared global.


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://langexplr.blogspot.com/DocsGardenOfEden"
xmlns:tns="http://langexplr.blogspot.com/DocsGardenOfEden"
xmlns="http://langexplr.blogspot.com/DocsGardenOfEden"
elementFormDefault="qualified">

<xs:complexType name="sectionType">
<xs:sequence>
<xs:element ref="tns:content" />
</xs:sequence>
<xs:attribute name="margin" type="xs:integer" />
</xs:complexType>

<xs:complexType name="bodyType">
<xs:sequence>
<xs:element ref="tns:paragraph" minOccurs="0"
maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>

<xs:element name="content" type="xs:string" />

<xs:element name="paragraph" type="xs:string"/>

<xs:element name="header" type="tns:sectionType" />

<xs:element name="body" type="tns:bodyType" />

<xs:element name="footer" type="tns:sectionType" />

<xs:complexType name="pageType">
<xs:sequence>
<xs:element ref="tns:header" />
<xs:element ref="tns:body" />
<xs:element ref="tns:footer" />
</xs:sequence>
</xs:complexType>

<xs:element name="page" type="tns:pageType" />
</xs:schema>



The Scala code for this pattern looks like this:


class GardenOfEden extends XsdDesignPattern {
def name = "Garden Of Eden"
def identify(schema : XSDSchema) =
schema match {
case XSDSchemaParts(
namespace,
types,
elements) =>
elements.forall((e : XSDElementDeclaration) =>
elementWithTypeReferences(e,types))
case _ => false
}

def elementWithTypeReferences(e : XSDElementDeclaration, types : List[XSDTypeDefinition]) =
e.getTypeDefinition.getContainer.isInstanceOf[XSDSchema] &&
((types.find ((t:XSDTypeDefinition) => t == e.getTypeDefinition)) match {
case Some(XSDComplexType(_,XSDSimpleSequenceModelGroup(particles))) =>
particles.forall({
case XSDParticleContent(e:XSDElementDeclaration) =>
e.isElementDeclarationReference
case _ => false })
case Some(XSDComplexType(_,null)) => true
case Some(XSDSimpleType(_)) => true
case None =>
e.getTypeDefinition.getTargetNamespace == "http://www.w3.org/2001/XMLSchema"
case _ => false
})

}





Finally we need a class to test all the patterns:


object XsdDesignPatterns {
def patterns:List[XsdDesignPattern] = List(new RussianDoll,
new SalamiSlice,
new VenetianBlind,
new GardenOfEden)
def identify(schema : XSDSchema) =
patterns.filter((p:XsdDesignPattern) => p identify schema).map((p:XsdDesignPattern) => p.name)
}




The code for this experiment can be found here.