Wednesday, September 24, 2008

Convert MS/Word to PDF in unix/linux

OpenOffice.org has a powerful support for plugings (called Macros) that allow a lot of additional functionality to be added to application. One common task is to convert MS/Word documents (.doc) into PDF. The recipe here uses Basic to program an OpenOffice.org macro to convert from DOC to PDF. We then illustrate how to turn this into a command line tool to convert from DOC to PDF. (This example was developed by DannyB.)

First, start up OpenOffice.org, perhaps as oowriter. Then from the Tools menu, select Macros, Organize Macros, OpenOffice.org Basic. A window will popup. Navigate, in the Macro from area, to My Macros, Standard, Module1. Edit the module to include just the following code:

REM ***** BASIC *****

Sub ConvertWordToPDF(cFile)
cURL = ConvertToURL(cFile)

' Open the document.
' Just blindly assume that the document is of a type that OOo will
' correctly recognize and open -- without specifying an import filter.
oDoc = StarDesktop.loadComponentFromURL(cURL, "_blank", 0, Array(MakePropertyValue("Hidden", True), ))

cFile = Left(cFile, Len(cFile) - 4) + ".pdf"
cURL = ConvertToURL(cFile)

' Save the document using a filter.
oDoc.storeToURL(cURL, Array(MakePropertyValue("FilterName", "writer_pdf_Export"), ))

oDoc.close(True)

End Sub

Function MakePropertyValue( Optional cName As String, Optional uValue ) As com.sun.star.beans.PropertyValue
Dim oPropertyValue As New com.sun.star.beans.PropertyValue
If Not IsMissing( cName ) Then
oPropertyValue.Name = cName
EndIf
If Not IsMissing( uValue ) Then
oPropertyValue.Value = uValue
EndIf
MakePropertyValue() = oPropertyValue
End Function


Save and exit from OpenOffice.org.

Now create a shell script, perhaps called doc2pdf in /usr/local/bin with:

#!/bin/sh

DIR=$(pwd)
DOC=$DIR/$1

/usr/bin/oowriter -invisible "macro:///Standard.Module1.ConvertWordToPDF($DOC)"


Then simply run it:

$ doc2pdf my.doc


and you should end up with a my.pdf!

The script is nothing perfect, and there is an issue in that the script will return before OpenOffice.org has finished its work. This, to convert a whole directory of files, you may want sonething like:

$ for i in *.doc; do echo $i; doc2pdf "$i"; sleep 5; done

check it out more here:
http://www.togaware.com/linux/survivor/Convert_MS_Word.html

1 comment:

Unknown said...

that's helpful, thanks!

Post a Comment