Sunday, November 9, 2008

Code generation technique using MS codename "Oslo", T4 templating engine and VS Custom tool

You can download full source code for this post here.

Not long time ago Microsoft announced new product named "Oslo".

Quote from Microsoft Oslo Developer Center:

"Oslo" is the code name for our platform for model-driven applications. The goal of "Oslo" is to provide a 10x productivity gain by making model-driven applications mainstream with domain-specific models, a new language, and tools.

I will not go into much details about "Oslo" Modeling Platform, as I cannot define it better than "Oslo" SDK

I will quote one more definition from MS Developer Center, "Oslo" modeling platform contains the following parts:

  • A visual design tool (Microsoft code name “Quadrant”) that enables people to design business processes with well-understood, flowchart-like graphics; developers to design applications and components that comply with the requirements of those processes; and both to move from one view back and forth to observe the effect any changes in either place have on the overall validity of the application or business process. For more information, see "Quadrant".
  • A modeling language (Microsoft code name “M”) that makes it natural to extend system-provided models (such as Windows Communication Foundation (WCF) or Windows Workflow Foundation (WF) models) or create your own models for use on the “Oslo” modeling platform. For more information, see "M".
  • A SQL Server database (the code name “Oslo” repository) that stores models as SQL Server schema objects and model instance data as rows in the tables that implement the schema. This data is available to “Quadrant” and any other tool or data-driven application that can make use of it (and that has the appropriate permissions to do so). Whether models or model instance data is created visually, using “M”, or using any SQL data access API (for example, ADO.NET, EDM, OLE-DB, and so on) creating models and storing them in the “Oslo” repository enables future applications to examine and manipulate not only data structures used by applications but – because applications are modeled – the applications themselves, as they run. If data-driven application has enough detailed model information, applications can run without recourse to static compilation. For more information, see "Oslo" Repository.
  • However in the text above taken from "Oslo" overview there is no a single word about M Grammar - another cool part in "Oslo" product.

    Most of all I was exited with this modeling language MG (M Grammar). It can be used to create your own textual DSL-s (Domain Specific Languages), here is definition again from MSDN:

    The MGrammar Language (Mg) was created to enable information to be represented in a textual form that is tuned for both the problem domain and the target audience. The Mg language provides simple constructs for describing the shape of a textual language – that shape includes the input syntax as well as the structure and contents of the underlying information. To that end, Mg acts as both a schema language that can validate that textual input conforms to a given language as well as a transformation language that projects textual input into data structures that are amenable to further processing or storage. The data that results from Mg processing is compatible with Mg’s sister language, The "Oslo" Modeling Language, "M", which provides a SQL-compatible schema and query language that can be used to further process the underlying information.

    In this post I will try to shortly (but I am afraid in very technical details) introduce some cool sample I created on the weekend.

    1) I created my own domain specific language using M Grammar language from Oslo SDK,

    2) Created T4 Templating Engine Custom Host that will allow me to use T4 Templating Engine to generate code artifacts by parsing the "Data Graph" created by M Grammar language parser and

    3) Used Visual Studio Custom Tool to make possible the generation of multiply artifacts for single DSL file and placing them under that file in Visual Studio files hierarchy.

    The Domain Specific Language that I use in this sample is simple enough to make the sample easily understandable, but in real company - or product-wide DSL the language may be more complex. M Grammar gives the powerful options to create complex languages. You can find some samples of languages created using M Grammar under "%ProgramFiles%\Microsoft Oslo SDK 1.0\Samples\MGrammar\Languages".

     

    What this sample does ?

    The sample allows to use DSL language input files in Visual Studio C# applications. The files will contain code written on language that I wrote using M Grammar.

    Just create new VS project, add new file(s) inside it with ".dslcontract" extension:

    image

    Now write down some code using syntax of our custom language:

    image

    image

    Now lets save that files and see what we get:

    image

    Cool isn't it?

    As you see for each Enum construct custom tool have generated separate file with different extensions and language-specific contents:

    image

    image

    Now what is nice about the approach I used to create this functionality is - you can fully control

    1) for what type of language high level abstraction the code artifacts will be generated (Enum in this case, but language can hold Class or some another constructs in the future)

    2) for what Mg image file the code artifacts will be generated (I will define what is Mg image file later in the post)

    3) what types of code artifacts will be generated  (extensions, number of files for each high level abstraction, here by "high level abstraction" I mean syntax rule named "HighLevelAbstraction" in my custom DSL - you can see the listing of my language a little farther in the post),

    4) what will be the contents of generated code files (this is controllable through T4 templates and custom T4 text templating host I use in the sample)

    I will describe in more details each of the points above.

    1) (for what type of language high level abstraction the code artifacts will be generated )

    If you go to "%CommonProgramFiles%\Saatec.Dsl.Contracts.Language" path (this is where VS custom tool looks for resources, for use in production you can use different path for resources) you will see the following files:

    image

    Here by creating a new T4 templates (files with *.tt extension) and naming them with Enum.*.tt I can instruct the custom tool to generate one more artifact for each "high  level abstraction" in my language named "Enum". See also point No 4 below for description what will be used "*" part of the file name for.

    2) (for what Mg image file the code artifacts will be generated )

    On the picture above you may have already noticed file named "Saatec.Dsl.Contract.mgx". This file can be replaced at any moment with updated Mg image file  of my custom language(which optionally will support new constructs, rules and high level abstractions). Current simple language grammar in textual form looks like this:

    image

    You can find it's listing in Subfolder named "LanguageDevelopmentTools":

    image

    Here I placed

    a) shortcut for "Intellipad" - tool used to create textual DSLs using M Grammar,

    b) Saatec.Dsl.Contract.mg - M Grammar file containing Language definitions - syntax etc for our DSL,

    c) Test.dsl.contract - just test input file that can be used to see what output graph is generated for a language,

    d) PackLanguage.bat file - the batch file that compiles language (.mg) file into mgx image file and places it in one hierarchy above (for use by our VS  custom tool). I will place a screen-shot of "Intellipad" tool in action later in the post.

    3) (what types of code artifacts will be generated)

    I already mentioned that by creating Enum.*.tt file it is possible to instruct custom tool to use this T4 template to generate artifact for "high level abstraction" in my language named "Enum". By specifying extension instead of "*" sign we can give instruction to out VS custom tool to generate file of that extension. For example after finding file "Enum.cs.tt" in this folder, custom tool will generate file "Abstraction name".cs file (MessageBoxButtons.cs in our example).

    4) (what will be the contents of generated code files)

    Template files support text templating syntax, and after processing by templating engine will generate output for each single "high level abstraction" in my language. T4 templates can (and in most cases will) be host-specific,(host specific and are able to get reference to the hierarchical Data Graph produced by DSL language parser. See more information about "hostspecific" directive here.

    Typical T4 template looks like:

    image

    This template file name is Enum.cs.tt - it means it will be used to generate C# code file for each Enum "high level abstraction" in my domain specific language. Also you can see that I use GetOption method in the Host to get reference to a Data Graph, saying honestly I don't know what for GetOption should be used normally, I just noticed this method and that it is not called from inside templating engine, and I decided it is good method to override to provide a mechanism for passing my Data Graph to T4 templates. As you see graph is referenced at this line:

    image

    In the custom host implementation I overridden this method to return DSL parser output data graph:

    image

    where the mContractGraph is of type ContractGraph:

    image

    This graph gives a hierarchical view that describes what was in input DSL code file. Now we can use many of cool features of T4 to create artifacts. To make this possible I used Custom Text Template Host which ensures correct work of templates and provides reference to data graph. We will look at more details about it's implementation later (May be in next post?).

    Now what about error handling? - Currently the sample dumps all error texts into file under .dslcontract file that is named similarly as contract but has .txt extension. You should check this file after generation to be sure that process went smoothly. The file will accumulate errors raised by DSL language parser and T4 templating engine. The errors that are caused by custom tool itself can be viewed by clicking "Run Custom Tool" explicitly:

    image

     

    What can be used this sample for ?

    You can create simple DSL languages and generate multiply artifacts for input code files written using syntax of that languages.

    First of all DSL gives an opportunity to create simpler syntax than in languages you want generate code for.

    Secondly you can generate multiply code files on several languages for single DSL code file. This can give productivity boost in some situations, when a lot of "plumbing" code is needed, or when you have to generate similar structure code on several languages. For example in the sample I generate enumerations in C# and javascript code. This can be done easily by writing down 20-40 symbols and hitting "Save" opposed to several hundreds you had to write in common situation.

    And finally DSL is language agnostic - it does not depend on C# syntax and any other, it is parsed and data graph is produced that can be used to generate specific language code as well as XML, JSON, DB or any other structure.

     

    Implementation

    The architecture of the sample can be presented as follows:

    image

    Sorry for my designer skills :)

    Ok, so data flows the following way:

    1) User clicks "Save" or "Run Custom Tool" on original file in Visual studio.

    2) Custom tool gets reference to caller item, its text contents, default namespace etc. Custom tool calls "Code Generator" passing to it input text and default namespace

    3) Code generator calls DSL language Parser to populate data graph. It passes to it only original file's input.

    4) DSL Language Parser searches for language package on the disk in predefined location (configuration-driven approach can be added later), loads it, parses the input text and iterates trough all nodes in the graph, thus populating the internal - more user friendly data structure that will be later used in T4 templates.

    5) collection of graphs for each abstractions in input file are return back to "Code Generator" block.

    6) Code Generator block searches the predefined place on hard disk to find T4 template file for each graph. Graphs contain information about what kind of abstraction the represent. By using string name of an abstraction we can find actual template that will be used then.

    7) Code Generation block instantiates new Text Templating Engine in new AppDomain, and passes to it graph, template filename and custom host reference, starting with this text generation process.

    8) Text generation process is repeated for each graph and each template found for each graph type.

    9) Collection of generated code strings is passed back to custom tool, which in its turn uses Visual Studio automation to generate multiply files on the disk and subordinate them to original file in Visual Studio project.

    10) Finally Visual Studio will get generate files, and is ready to build.

    First of all to be able to run a sample code you have to download Oslo SDK from Microsoft website.

    It will install all tools and assemblies that are needed to author and parse textual DSLs, most specifically M Grammar.

    You can then quickly review "M Grammar in a Nutshell" document that will be available under Oslo program shortcut.

    It is really simple to get in quickly to M Grammar, You will have to use Intellipad tools to create your own language. It has good features - like syntax highlighting, error messages for syntax errors in M Grammar language code and more.

    The tool in action looks like this:

    image

    Here the language itself is in the center pane,

    Left pane will contain sample input for our language and

    Right pane will contain graph, that will be generated after parsing input.

    If there are any errors in input code you will see their details in the lower pane.

    There are more advanced ways of using complex languages - like using multiply files, you can see documentation, and very nice samples that come with Oslo SDK.

    After we created a language file, we can package it to file with .mgx extension, it can be later used by our applications to parse user input. What I really liked about M Grammar - it gives an opportunity of parsing input dynamically - this means  that languages can be used at runtime in our applications.

     

    So now we have our language file created and packed, lets see the VS solution for the sample.

    image

    Solution consists of four projects:

    CustomTool - The project containing class that implements IVsSingleFileGenerator interface - and implements it's methods "Generate" and "GetDefaultExtension". That class is called by VS when we hit "save" on file using that custom tool. Custom tool needs special registry keys to be set to point to its location. Also custom tool assembly should be registered with regasm tool, to be visible to VS COM process. I created "Install.bat" file that makes this actions automatically for you after each build of a solution. Also I created "Uninstall.bat" file which is also called each time before "Install.bat" is called. You can manually execute those files to uninstall or install our custom tool.

    Generator - this project implements all functionality that is used to parse our DSL language.  It should not be registered in GAC, because it references Oslo assemblies, that I could not register in GAC by some reason to the moment of writing this post.

    Interface - this is assembly that contains shared classes and interfaces needed for intercommunication of another assemblies in the solution. It should be registered in GAC. (This is done automatically by Install.bat" file I mentioned above.

    Generator.Host - this project implements custom text templating host. It is used to execute/ generate T4 templates and provide them graph for code generation. You can see it's implementation yourself, I really was not expecting that implementing custom host to be so simple. I just overrided several methods and properties to make it work in environment and AppDomain separated from Visual Studio.

     

    Some notes for getting started with the sample:

    After successful build of a solution please copy this directory:

    image

    To the path: %CommonProgramFiles%\Saatec.Dsl.Contracts.Language

    This is were custom tool will search for templates and language.

     

    Install.bat and Uninstall.bat files that are registering custom tool can be found here:

    image

    If you find this sample good for production you will have to create VS setup project and get rid of bat and reg files and instead use features of the project.

     

    To debug the custom tool just start another Visual Studio Instance and attach to it from VS running custom tool solution. Now after hitting "Save" on file with our custom tool in the VS instance being debugged you can hit breakpoints in our custom tool code.

     

    For generation of multiply files inside VS I use the code provided by Adam Langley (see below the original article URL).

    I think the post is going too long - so here is the source code. Download and use it on your own risk :)

     

    These two articles helped me a lot in understanding how T4 works under the hood, and how multiply files can be generated using T4 and VS Custom tool:

    How to generate multiple outputs from single T4 template

    Creating a Custom Tool to Generate Multiple Files in Visual Studio 2005

    Thanks to authors Oleg Sych and  Adam Langley for great articles and source code.

     

    Happy programming!

    kick it on DotNetKicks.com