Resizing sixteen thousand images

August 5, 2018

Over the years I have collected or taken thousands of photographs, ranging in size up to about ten megabytes. The large files are unnecessarily large for simply printing 4x6's, not to mention that putting those online becomes cumbersome.

In order to solve this problem, I decided to write some scripts that would do everything for me.

Here's what I came up with (after many fits and starts):

Step 1: list all the files

find . > all_files.txt

Step 2:
split up the list of files into sections (this way, we'll have an indicator of how far we have progressed as we run through our script). Split into files with 300 lines each:

split all_files.txt -l300

Step 3:
Write a file we'll run to process our files:

$ cat go.sh
#!/usr/bin/bash
# make newlines the only separator
# This way, each line in the file will get processed as an entirety.  Otherwise,
# our loop will break tokens up by whitespace.
IFS=$'\n'       

# loop through all the split files (consisting of 300 lines each of files to process)
for x in $(ls x*); do
  echo "processing $x"
  # for each line in the file, run "do_convert" on it.  Send stdout and stderr logs to conversion_log.txt
  (for i in $(cat $x); do do_convert "$i"; done) >>../conversion_log.txt 2>&1
done

Step 4:
In our previous code we implied the existence of a program that processes each file: do_convert. Let's look at that:

$ cat do_convert

#!/usr/bin/bash

# assign some constants.
UUID=$(random)
NEWDIRECTORY="../smallerpics/"
SUFFIX=$(suffix "$1")
COUNT_OF_WORD_IMAGE=$(file "$1"|grep -c "image")

echo "inspecting $1"
if [ "$COUNT_OF_WORD_IMAGE" -gt "0" ]; then
  echo "converting \"$1\" to $NEWDIRECTORY$UUID.$SUFFIX"
  
  # The following line uses "convert" from the Imagemagick suite.
  # The resize syntax - "800>" - says to make the larger side no more than 800 pixels
  convert "$1" -resize "800>" $NEWDIRECTORY$UUID.$SUFFIX
fi

Step 5:
This file implies the existence of two other files: random, which gives us a uuid (we'll use that as our new filename), and suffix (which gives us the suffix of a file). Here they are:

$ cat random
#!/usr/bin/python

import uuid
print uuid.uuid4()
$ cat suffix
#!/usr/bin/python

import re
import sys

# looks complex, but it's reversing the input, getting text up to the first period,
# then, reversing that.  Gives us the suffix to nasty stuff like:
# /a/b/c/blah_blah/ bleh/ foo/ /-/ bar.jpg  
#    gives us: "jpg"
print (re.match("^[^.]+", (sys.argv[1][::-1])).group(0))[::-1]

Also note that I was using the Imagemagick suite to do the actual image conversions. The program I used was called "convert"

Step 6:
Now we simply run the command.

$ ./go.sh

This will put data into the log file mentioned previously. It's good to keep an eye on that, as well as what the program is printing on the command line about which file it is processing. Like I said, it went in fits and starts as I encountered issues. For example, I didn't have double-quotes around the file I was processing in my scripts at first. This led to issues when my code would try to parse something having spaces. Similarly, as mentioned above, I needed to include IFS=$'\n' so my loop would only break on newlines rather than whitespace when processing.

To make videos smaller using ffmpeg, this command line will do the trick. It uses lib264, which has good video compression, and makes the video smaller but doesn't ruin its aspect ratio.

ffmpeg -i INPUT_FILE -c:v libx264 -s 320x240 -preset veryfast OUTPUT_FILE.mp4

To create tiny-sized thumbnails, where the priority is smallest size with just a sense of the content of the photo. It uses only 4 colors in the output photo and removes all metadata, highly compresses the file, and makes it 100x100 in size.

convert $i -strip -thumbnail "100x100" -dither FloydSteinberg -colors 4 t/$i.png

Contact me at renaissance.nomad (at) google.com