Monday, June 7, 2010

Generating MS Word 2007 file from html content

Hello everyone,

My teammate was given a task to edit html markup content stored in a database and to generate a word document from the edited html content. I used "DocumentFormat.OpenXml" to generate a word document from the html content. Happy to solve my teammate's problem :).

The below code use a template word document. Using the template, a new document is generated on server. The file can be deleted from the server once client has downloaded.

Add two dlls DocumentFormat.OpenXml.dll and WindowsBase.dll in your web application/ website.

Add references "DocumentFormat.OpenXml.Packaging" and "DocumentFormat.OpenXml.Wordprocessing" in your code.

Code:

//It will generate Word 2007 document for the html content.
public static void GenerateWordDocFromHTMLContent(string templateFilePath, string documentFilePath, DataTable data, string fileName)
{
File.Copy(templateFilePath, documentFilePath, true);

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(documentFilePath, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
int altChunkIdCounter = 1;
int blockLevelCounter = 1;

//Fetch the html content from DataTable. Let us take first record.
string htmlContent = data.Rows[0]["HTMLMarkUp"].ToString();
string html = "<html><body>" + htmlContent + "</html></body>";
string altChunkId = String.Format("AltChunkId{0}", altChunkIdCounter++);

//Import data as html content using Altchunk
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Xhtml, altChunkId);

using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
using (StreamWriter stringStream = new StreamWriter(chunkStream, Encoding.UTF8)) //Encoding.UTF8 is important to remove special characters
stringStream.Write(html);

AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;

mainPart.Document.Body.InsertAt(altChunk, blockLevelCounter++);
mainPart.Document.Save();
}
}

Note: The content type or Mime type for word 2007 file is "application/vnd.openxmlformats-officedocument.wordprocessingml.document"

References:
http://msdn.microsoft.com/en-us/library/ee956524%28office.14%29.aspx
http://msdn.microsoft.com/en-us/library/dd469465.aspx