Showing posts with label html to word. Show all posts
Showing posts with label html to word. Show all posts

Monday, June 7, 2010

Generating MS Word 2007 file from html content

Hello everyone,

My teammate was given a task to edit html markup content stored in a database and to generate a word document from the edited html content. I used "DocumentFormat.OpenXml" to generate a word document from the html content. Happy to solve my teammate's problem :).

The below code use a template word document. Using the template, a new document is generated on server. The file can be deleted from the server once client has downloaded.

Add two dlls DocumentFormat.OpenXml.dll and WindowsBase.dll in your web application/ website.

Add references "DocumentFormat.OpenXml.Packaging" and "DocumentFormat.OpenXml.Wordprocessing" in your code.

Code:

//It will generate Word 2007 document for the html content.
public static void GenerateWordDocFromHTMLContent(string templateFilePath, string documentFilePath, DataTable data, string fileName)
{
File.Copy(templateFilePath, documentFilePath, true);

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(documentFilePath, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
int altChunkIdCounter = 1;
int blockLevelCounter = 1;

//Fetch the html content from DataTable. Let us take first record.
string htmlContent = data.Rows[0]["HTMLMarkUp"].ToString();
string html = "<html><body>" + htmlContent + "</html></body>";
string altChunkId = String.Format("AltChunkId{0}", altChunkIdCounter++);

//Import data as html content using Altchunk
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Xhtml, altChunkId);

using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
using (StreamWriter stringStream = new StreamWriter(chunkStream, Encoding.UTF8)) //Encoding.UTF8 is important to remove special characters
stringStream.Write(html);

AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;

mainPart.Document.Body.InsertAt(altChunk, blockLevelCounter++);
mainPart.Document.Save();
}
}

Note: The content type or Mime type for word 2007 file is "application/vnd.openxmlformats-officedocument.wordprocessingml.document"

References:
http://msdn.microsoft.com/en-us/library/ee956524%28office.14%29.aspx
http://msdn.microsoft.com/en-us/library/dd469465.aspx