Tuesday, 19 October 2010

Document Generation with Word 2007 and ASP.NET

I had been asked to look at this for a project and spent many an hour trawling the interwebs to find out about solutions. The thing that makes it hard is that MS have a tendency to change technologies frequently (although generally for the better) and their MSDN articles are galaxian in size so trawling the various guides, APIs, white papers, support and other pages takes time.
Anyway, I've found some helpful articles and thought I would write a very easy to follow guide to getting the basics working.
If you want to generate a document on-the-fly, chances are that you want to populate the document with dynamic data from some form of database or other data source (the actual source is not important). Currently, this has to be done with various horrible COM technologies or otherwise by directly hacking Word 2003 XML files.
Word 2007 has the ability to link data controls on the page to XML data in the document, which can be generated dynamically. A Word 2007 document is actually a ZIP file (rename it and see!) which contains the various related XML files and which form the overall document, the XML which we will use to generate docs is custom XML and can follow any or no schema. In my case it is a very simple XML file with nothing more than the XML element and a single entity with 3 elements. Because this custom XML is separate from the rest of the document, I can change it at my leisure without affecting any formatting.
Now when adding these data controls, know as Content Controls, there are certain things that are not that obvious. Firstly, the controls can only be linked easily to the XML using code. There is a whole schema thing where you can define a schema and tell Word about it but I went another route by creating a Ribbon Control add-in for Word and using the C# code to add a correctly linked Content Control to a document. These can be added without the XML being present (they just won't have any data until the XML appears at some point). If you create a Word add-in project in Visual Studio and create a Ribbon(XML) component, which is fairly easy, you can then add the code which will look something like this (note the public modifier on the button handler):

// In the handler for a certain button - Ribbon1.cs
public void OnSellingPriceButton(Office.IRibbonControl control)
{
Globals.ThisAddIn.CreateBoundDataItem("Total Sale", "TotalSellingPrice");
}

the reason this calls a function via the AddIn is because the AddIn has access to the underlying document whereas the ribbon doesn't (although I think it is possible to do something weird to gain access). The add-in code in my case is then:

public void CreateBoundDataItem(string theTitle, string theProperty)
{
Microsoft.Office.Tools.Word.Document doc = Globals.ThisAddIn.Application.ActiveDocument.GetVstoObject();
Tools.PlainTextContentControl plainTextControl1 = doc.Controls.AddPlainTextContentControl("plainTextControl" + Count.ToString());
plainTextControl1.Title = theTitle;
plainTextControl1.XMLMapping.SetMapping("/Quote/" + theProperty, null, null);
++Count;
}

Special notes here! You CANNOT bind XML to the rich text control so use a plain text control. Also, you need to get the vsto object in order to get the correctly typed document to access the controls collection. Also, when adding the control, you have to provide a unique name so I use a static int variable in this class to generate a unique name.
Obviously the SetMapping code (XPath) will be unique to your XML format, mine, as I said, is very simple. Also, the title property is what appears on the control tab when it is inserted into the document (as an aide-memoire). You can set other properties here like whether the data is readonly and whether the control can be deleted.
The cool thing about the add-in is that once it is built, you simply copy the output from the bin/relase directory into your AddIns directory in AppData (search for AddIns) and Word will automatically show the ribbon.
Once the ribbon is installed, it will create content controls pointing to whatever part of data you need. At this point there may or may not be any xml in the custom parts of the document but if you are using the Packaging classes in C# (which we use in the server end), it appears that it can replace the XML but not create it (might be some permissions thing?) so you will need to make sure that the document starts with a correctly formed XML custom part, even if the element data in it is blank or invalid. You can do this by running some VB in Word 2007 with your document open (you will need to enable the developer tab in Word Options), in the immediate window, type each of these lines and press 'enter' after them to invoke them:

ActiveDocument.CustomXMLParts.Add
ActiveDocument.CustomXMLParts(4).Load ("c:\CustomerData.xml")

You need to use the index 4 (the doc already has 3 built-in ones) but your path to the XML will obviously be wherever it is. The system will create an item called item1.xml in the customxml directory inside your docx zip structure and the fields will immediately be available to your content controls.
That is the document part done. You have a couple of alternatives now, depending on what you are doing but worth work in the same way. You can either keep a single document with the custom XML present but no content, and use this as a basis for any generated documents or you can save a bunch of different documents, each with the custom XML in place and then work on these individually or together. It depends on whether your generation is one-way or whether it will be loaded and saved.
The next post will be the server side which will look at the various options.
Post a Comment