Mike Slinn
Mike Slinn

ImageMagick Slicing on Ubuntu/WSL

Published 2022-07-28. Last modified 2022-08-03.
Time to read: about 8 minutes.

This site is categorized under Software-Expert, Ubuntu, WSL.

Lawyers like the Microsoft Office software suite; so when I am working on a court case as an expert, I endeavor to provide my clients with Word documents that contain necessary information. I like working in WSL/WSL2 because I can use Windows programs and Ubuntu programs together effectively.

Grab Image, Then Slice

Recently, I used SnagIt, a Windows program, to capture large web pages as single images. Some of these images were quite tall.

The CSS for the web pages made some content invisible on the printed page. Yes, I could have injected CSS using a Chrome plugin like My Style to ensure that all content will be printed, like this:

Injected Style
@media print {
  * { display: initial; visibility: visible; }
}

However, because my results as an expert must be reproducible, I would have to disclose the CSS injection in my report. It is easy to imagine questions in discovery similar to “So, you admit that the pages you provided have been altered from what the software vendor provided because you injected some special code. How can you certify that nothing else was changed?” That question would destroy my credibility no matter how I answered.

That is why I decided to use screen grabs, which would guarantee that the contents of my report would exactly match what had been displayed on the screen, without injecting anything into the web pages.

ImageMagick is preinstalled on Ubuntu Desktop. I used ImageMagick to slice the image captures into smaller page-sized images, so they could be inserted into a Word document.

The Computer Worked Hard

Grabbing such large web pages was a lot of work for my desktop computer. The only programs active during the screen grab process were the Google Chrome browser and SnagIt. I found that 10GB RAM and 30% of the GPU capability (an NVidia GTX 1660 Super) was used.

The screen grab failed if I did not start scrolling from the top of the web page; while it is possible to scrub up and down smaller web pages in order to grab portions of interest, this fails for large pages.

I also found that scrolling too fast caused the screen grabbing process to fail. Clicking and holding the bottom scroll arrowhead at the bottom right of the screen seemed to result in a smooth and optimal scrolling speed. This meant that grabbing large web pages took a few minutes as the page slowly scrolled downward.

Setting Up the Conversion

The Word documents I usually work with are formatted for North American standards. This means one-inch margins on letter-sized paper (8.5" x 11"), which gives a working area of 6.5" x 9", yielding an aspect ratio of 0.72.

The tall captured images needed to be sliced into rectangles that fit efficiently into Word documents. The computations are as follows.

  1. Determine the width of a screen grab and save it into W. The ImageMagick identify command does not provide a newline after its output, however I have inserted one for readability:
    Shell
    $ identify -ping -format '%w' ../IMG2005.png
    1536 
    
    $ export W="$( identify -ping -format '%w' ../IMG2005.png )"
  2. Determine the height of a screen grab and save it into H:
    Shell
    $ identify -ping -format '%h' ../IMG2005.png
    
    $ export H="$( identify -ping -format '%h' ../IMG2005.png )"
  3. The width can be divided by the aspect ratio to obtain the desired height of each slice so they can be inserted optimally into the Word documents. I used the bc calculator provided with Bash to divide W / ASPECT_RATIO. The H2 integer variable contains the computed height for the images.
    Shell
    $ export ASPECT_RATIO=0.72
    
    $ export H2="$( echo "scale=0 ; $W / $ASPECT_RATIO" | bc )"
  4. Now the image called IMG2005.png can be sliced using ImageMagick’s convert command. The slices are stored into a subdirectory called slices, with file names like IMG2005-1.jpg, IMG2005-2.jpg, etc.
    Shell
    $ convert IMG2005.png -crop ${W}x${H2} \
      -quality 100% -scene 0 slices/IMG2005-%d.jpg

Automating the Conversion

I wrote the following bash script, which incorporates the above computations. It slices all the images in a directory and saves the results to a second directory.

sliceImages
#!/bin/bash

function help {
  if [ "$1" ]; then echo "Error: $1"; fi
  echo "
$(basename $0): slice all images in the given directory and place them into a specified directory,
which will be created if required.
"
  exit 1
}

function setup {
  export ASPECT_RATIO=0.72
  export W="$( identify -ping -format '%w' "$1" )"
  export H="$( identify -ping -format '%h' "$1" )"
  export H2="$( echo "scale=0 ; $W / $ASPECT_RATIO" | bc )"
}

function convert1 {
  FULLNAME=$(basename -- "$1")
  FILENAME="${FULLNAME%.*}"
  FILETYPE="${FULLNAME##*.}"

  convert "$1" \
    -crop "${W}x${H2}" \
    -quality 100% \
    -scene 0 \
    "$DIR_OUTPUT/$FILENAME-%d.png"
}


if [ -z "$1" ]; then help "No directory path for images to be converted was provided."; fi
export DIR_INPUT="$( realpath $1 )"

if [ -z "$2" ]; then help "No directory path for the image slices to be saved into was provided."; fi
export DIR_OUTPUT="$( realpath $2 )"

mkdir -p "$DIR_OUTPUT"

find $DIR_INPUT -type f -exec file --mime-type {} \+ | awk -F: '{if ($2 ~/image\//) print $1}' |
  while read FILE; do
    setup "$FILE"
    echo "Slicing $FILE into ${W}x${H2} pixels"
    convert1 "$FILE"
  done

Overcoming ImageMagick Processing Limits

Some of the web pages that I needed to grab were quite long, which resulted in those images requiring more computational resources than the default Imagemagick configuration allows. This caused errors such as the following to appear:

convert-im6.q16: no images defined `/mnt/c/images/slices/IMG1466-%d.png' @ error/convert.c/ConvertImageCommand/3229.
convert-im6.q16: cache resources exhausted `/mnt/c/images/IMG1091.png' @ error/cache.c/OpenPixelCache/4095.

Imagemagick defines computational resources limits in /etc/ImageMagick-6/policy.xml. The default maximum memory is 256 KB, the default maximum allowable height is 16,000 pixels (16KP), and the default maximum area is 128M pixels. These values are defined by the following entries:

/etc/ImageMagick-6/policy.xml
<policy domain="resource" name="memory" value="256MiB"/>
<policy domain="resource" name="height" value="16KP"/>
<policy domain="resource" name="area" value="128MP"/>

I changed the maximum memory limit to 2 GB RAM, the maximum height limit to 10,000,000 pixels (10MP), and the maximum area limit to 2G pixels with these entries:

/etc/ImageMagick-6/policy.xml
<policy domain="resource" name="memory" value="2GiB"/>
<policy domain="resource" name="height" value="10MP"/>
<policy domain="resource" name="area" value="2GP"/>

Alternatively, I could have simply commented out the limits, as shown in highlighted text below.

/etc/ImageMagick-6/policy.xml
<!--
<policy domain="resource" name="memory" value="256MiB"/>
<policy domain="resource" name="height" value="16KP"/>
<policy domain="resource" name="area" value="128MP"/>
-->

The largest web page to be sliced was converted to a very tall image, which was 83,703 pixels high. It was sliced into 40 images.

Word Macro

A Word macro is also needed to insert the images into the currently open Word document in alphabetical order. I modified this one.

Microsoft Word Macro
Sub insertImages()
    Dim intResult As Integer
    Dim strPath As String
    Dim strFolderPath As String
    Dim objFSO As Object
    Dim objFolder As Object
    Dim objFile As Object
    Dim i As Integer

    intResult = Application.FileDialog(msoFileDialogFolderPicker).Show
    'Check if user canceled the dialog
    If intResult <> 0 Then
        'dispaly message box
        strFolderPath = Application.FileDialog(msoFileDialogFolderPicker).SelectedItems(1)
        'Create an instance of the FileSystemObject
        Set objFSO = CreateObject("Scripting.FileSystemObject")
        'Get the folder object
        Set objFolder = objFSO.GetFolder(strFolderPath)
        i = 1
        'loops through each file in the directory and prints their names and path
        For Each objFile In objFolder.Files
            'get file path
            strPath = objFile.Path
            'insert the image
            Selection.InlineShapes.AddPicture FileName:= _
               strPath, LinkToFile:=False, _
               SaveWithDocument:=True
        Next objFile
    End If
End Sub

Done!

😁

Thanks to the above automation, I was able to deliver the Word documents containing the sliced web pages to my client soon after they were requested.