Convert TIFF images to PDF

ocr

#1

Major browsers have several limitation displaying TIFF files inside (most of them propose download original tiff).
But we need to show content of tiff files in Human Tasks (e.g. for data extraction).
Widely used solution is to convert original TIFF file into PDF and show converted PDF inside Human Task.
We usually use built in Linux library tiff2pdf.
Below you can find example of machine config, which converts TIFF file into PDF and uploads converted PDF to s3.

<var-def name="convertedPDFLink">
	<script return="convert(document_link.toString())"><![CDATA[
		import java.io.BufferedReader;
		import java.io.File;
		import java.io.IOException;
		import java.io.InputStreamReader;
		import java.net.URL;
		import com.itextpdf.text.Image;

		import com.google.common.io.Files;
		import org.apache.commons.io.FileUtils;
		import org.apache.commons.io.FilenameUtils;
		import com.freedomoss.workfusion.utils.gson.GsonUtils;

		static File convert(String documentLink) throws IOException, InterruptedException {
	        File baseDir = Files.createTempDir();
	        
	        File inputFolder = new File(baseDir, "input");

	        URL documentUrl = new URL(documentLink);
	        String inputFileName = FilenameUtils.getName(documentUrl.getPath());
	        if (inputFileName == null || inputFileName.isEmpty()) {
	            inputFileName = "input.tiff";
	        }

	        File inputFile = new File(inputFolder, inputFileName);
	        FileUtils.copyURLToFile(documentUrl, inputFile);
	        
	        File outputFolder = new File(baseDir, "output");
	        FileUtils.forceMkdir(outputFolder);

	        File result = new File(outputFolder, "output.pdf");                  

	        invoke(new ProcessBuilder(new String[] {
	            "tiff2pdf", 
	            inputFile.getAbsolutePath(),
	            "-o",                             
	            result.getAbsolutePath()
	            })); 
	        FileUtils.forceDelete(inputFolder);
	        return result;
	    }

	    static void invoke(ProcessBuilder builder) throws IOException, InterruptedException {
	        builder.redirectErrorStream(true);

	        System.out.println(builder.command());
	        Process process = builder.start();

	        BufferedReader in = new BufferedReader(new InputStreamReader(process.getInputStream()));
	        String line;
	        while ((line = in.readLine()) != null) {
	            System.out.println(line);
	        }
	        in.close();

	        int code = process.waitFor();
	        if (code != 0) {
	            throw new RuntimeException("Failed to invoke process: " + builder.command() + ". Return code: " + code);
	        }
	    }

	]]></script>
</var-def>

<var-def name="content">
    <file path="${convertedPDFLink}" type="binary"/>
</var-def>

<var-def name="converted_pdf_link">
    <s3  bucket="temp_bucket">
      <s3-put path="conver/${document_uuid}.pdf" content="${content}" content-type="application/pdf" content-disposition="inline"/>
	</s3>
</var-def>

#2

This is a great example, and it does work as expected. The only remark here is that tiff2pdf needs to be installed on the linux box, where Wokfusion is running.

Another version of the TIFF to PDF conversion, where user actually has control over quality of the output pdf file, is to use ImageMagic with GhostScript. If both utilities are installed on the Linux where WF is running, you may try changing invoke call to something like:

invoke(new ProcessBuilder(new String[] {
                "convert",
                "-limit","memory", "0",
                "-limit", "map", "0",
                inputFile.getAbsolutePath(),
	    "-compress", "jpeg", 
            "-quality", "40",                        
                result.getAbsolutePath()}
              ));

the compress and quality parameters for convert utility are worth experimenting with to produce PDF of the acceptable quality and size.


#3

Hey, we have site2image2 , so U can convert using http get/post request