BIML and its secret sauce.

I’ve been using BIML for a while now and while I’ve seen a lot of interesting blogs and articles on the web I’ve not really come across something that covers my experience with it. So here I am writing a blog about it.

This isn’t an introduction to BIML. There are several really good intros out there. I can recommend Andy Leonards Stairway to BIML series. I’d also point anyone looking to start BIML at the tutorials and walkthroughs on bimlscript.com

There’s also a fantastic blog post here about some top tips for developing BIML and I’d agree with all of these pretty much wholesale. This is a more advanced look at an approach to using BIML in what I hope is a directly applicable manner.

I work as a BI consultant. This involves me building lots of ETL to get the data staged so I can push it out using cubes or tabular models. Sometimes its just about creating a tidy well organised data source that people write good old fashioned SSRS reports off. But in all cases it starts with a spreadsheet. This is basically a source to target mapping. It tells me what data I should be pulling, from where, and where I should be pushing it. Mapping this is a pretty dull job but its always necessary. And if you’re going to do any kind of requirements gathering or gap analysis this is unavoidable.

I’ll come back to the spreadsheet in my next post and share a bit of PowerShell cleverness that can turn the spreadsheet into BIML that defines the tables and columns you want to manipulate in your ETL and this is the secret sauce that unlocks BIMLs power.

A lot of the blogs I see focus on how to use BIML to produce SSIS packages, and that is of course what I use it for. This is because this functionality ships free as part of BIDS Helper. And this is most peoples focus. BIML gets into its stride when you start adding some scripting to make the BIML code iterate over a set of tables and columns and there are a few examples out there that point you at this aspect of it. Many of these are oriented towards pointing the solution at an existing database and using its meta data to define the tables and columns you want to import.

But there is another section of the BIML definition that allows you to define tables and columns.

Essentially you can define a table with some columns using some BIML like this

<Table Name="thingy" SchemaName="aschema">
  <Columns>
    <Column Name="somecolumn" DataType="AnsiString" Length="50">
  </Columns>
</Table>

You can have several tables all contained in

<Tables>
…
</Tables>

Now this doesn’t look enormously clever. And indeed it isn’t at this point. The types you use within these definitions aren’t SQL datatypes but .Net types and the reference for these is one of my most used links whenever I’m developing BIML. Its here.

The really interesting thing about this is that you are defining a set of META data that you can iterate over in your BIML with some code like this

<# for each table in tables { #>
   <!-- do something for each table like build a SSIS package -->
   <# for each column in table.columns #>
      <!-- do something for each column… -->
   <# } #>
<# } #>

Now this is the clever bit. Everything in BIML can have a set of child ANNOTATION XML tags added to it. There are different ANNOTATION types. While the Description type is useful and I use it a lot in my packages, the more useful and grossly overlooked tag is the AnnotationType=”Tag”. This can be used to add data to a thing that isn’t part of the BIML definition. Its part of the meta data about the thing you are defining.

So you can define a table like this

<Tables>
   <Table Name="thingy" SchemaName="aschema">
      <Annotations>
        <Annotation AnnotationType="Tag" Tag="Something">SomeValue</Annotation>
        <Annotation AnnotationType="Tag" Tag="SomethingElse">SomeOtherValue</Annotation>
      </Annotations>
      <Columns>
         <Column Name="somecolumn" DataType="AnsiString" Length="50">
      </Columns>
   </Table>
</Tables>

Now when we want to iterate over the tables/columns in BIML there is a function available on pretty much everything called GetTag(). This returns the values defined on the thing for the Annotation with AnnotationType=”Tag” and you pass in which Tag=”” value you want to lookup. If the specified tag isn’t defined the function returns an empty string. If the tag exists however you get the value. So specifying column.GetTag(“Something”) would return “SomeValue” in the above example.

So  you can write BIML code like this

<# for each table in tables { #>
   <!-- do something for each table like build a SSIS package -->
   <# if (table.GetTag("Something") == "SomeValue"{ #>
      <!-- Do one thing -->
   <# } else { #>
      <!-- Do something completely different -->
   <# } #>
<# } #>

So we can now drive the BIML execution by adding extra meta data to the table definitions. We can for instance cycle through all the tables that use CSV files as a source type and build packages for those, and build different packages for “tables” sourced from a SQL Instance. We can handle columns differently based on meta data about the column, do lookups for example, or do CAST/CONVERT operations. All by adding Annotations to the BIML Table/Column definition.

To make use of this you have to jump through a couple of hoops. The tables have to belong to a schema. And the scheme has to belong to a DB. And the DB has to actually belong to something real, like an actual connection (you may be able to create a virtual connection but I haven’t figured out how). After some trial and error pushing various combinations of files into the BIML engine in BIDS Helper and responding to the various (rather helpful) errors that I’ve created, I’ve found I need to have each of these definitions in separate BIML files but they do need to exist. The only external dependency is that I have an empty DB called TemplateDB on my dev box which is where I point the underlying connection that eventually lies at the root of all this to make it work.

When trying to validate the BIML or build any subsequent packages you have to have all these BIML files in a single project and you have to select them all before you right click and select Check BIML for Errors, or Create SSIS Packages. Selecting more than one file to undertake an action is a bit counter-intuitive to those of used to SSIS and SSRS development in BIDS but it works. And of course you can add another BIML file that actually generates packages to the selected set. This can use the “tables” defined in the BIML model (BIML definition files?) to emit packages.

I’m attaching a ZIP file which has the files mentioned above examples. Note that Creating SSIS Packages will do nothing with these examples as there isn’t any BIML as yet for doing any package generation. You’ll also need to crate an empty DB on your local instance called TemplateDB as the connection gets validated when running or checking the BIML.

Advertisements
About

Can't sing, dance or tell jokes. But basically still a nice guy.

Posted in BIML, Config Driven ETL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: