Resizing sixteen thousand images
Over the years I have collected or taken thousands of photographs, ranging in size up to about ten megabytes. The large files are unnecessarily large for simply printing 4x6's, not to mention that putting those online becomes cumbersome.
In order to solve this problem, I decided to write some scripts that would do everything for me.
Here's what I came up with (after many fits and starts):
Step 1: list all the files
find . > all_files.txt
Step 2:
split up the list of files into sections (this way, we'll have an
indicator of how far we have progressed as we run through our script). Split
into files with 300 lines each:
split all_files.txt -l300
Step 3:
Write a file we'll run to process our files:
$ cat go.sh
#!/usr/bin/bash
# make newlines the only separator
# This way, each line in the file will get processed as an entirety. Otherwise,
# our loop will break tokens up by whitespace.
IFS=$'\n'
# loop through all the split files (consisting of 300 lines each of files to process)
for x in $(ls x*); do
echo "processing $x"
# for each line in the file, run "do_convert" on it. Send stdout and stderr logs to conversion_log.txt
(for i in $(cat $x); do do_convert "$i"; done) >>../conversion_log.txt 2>&1
done
Step 4:
In our previous code we implied the existence of a program that processes each
file: do_convert. Let's look at that:
$ cat do_convert
#!/usr/bin/bash
# assign some constants.
UUID=$(random)
NEWDIRECTORY="../smallerpics/"
SUFFIX=$(suffix "$1")
COUNT_OF_WORD_IMAGE=$(file "$1"|grep -c "image")
echo "inspecting $1"
if [ "$COUNT_OF_WORD_IMAGE" -gt "0" ]; then
echo "converting \"$1\" to $NEWDIRECTORY$UUID.$SUFFIX"
# The following line uses "convert" from the Imagemagick suite.
# The resize syntax - "800>" - says to make the larger side no more than 800 pixels
convert "$1" -resize "800>" $NEWDIRECTORY$UUID.$SUFFIX
fi
Step 5:
This file implies the existence of two other files: random, which gives us a
uuid (we'll use that as our new filename), and suffix (which gives us the
suffix of a file). Here they are:
$ cat random
#!/usr/bin/python
import uuid
print uuid.uuid4()
$ cat suffix
#!/usr/bin/python
import re
import sys
# looks complex, but it's reversing the input, getting text up to the first period,
# then, reversing that. Gives us the suffix to nasty stuff like:
# /a/b/c/blah_blah/ bleh/ foo/ /-/ bar.jpg
# gives us: "jpg"
print (re.match("^[^.]+", (sys.argv[1][::-1])).group(0))[::-1]
Also note that I was using the Imagemagick suite to do the actual image conversions. The program I used was called "convert"
Step 6:
Now we simply run the command.
$ ./go.sh
This will put data into the log file mentioned previously. It's good to keep an eye on that, as well as what the program is printing on the command line about which file it is processing. Like I said, it went in fits and starts as I encountered issues. For example, I didn't have double-quotes around the file I was processing in my scripts at first. This led to issues when my code would try to parse something having spaces. Similarly, as mentioned above, I needed to include IFS=$'\n' so my loop would only break on newlines rather than whitespace when processing.
To make videos smaller using ffmpeg, this command line will do the trick. It uses lib264, which has good video compression, and makes the video smaller but doesn't ruin its aspect ratio. Following that is a line which keeps the aspect ratio but sets the width (letting the height vary) and trimming the start and end.
ffmpeg -i INPUT_FILE -c:v libx264 -s 320x240 -preset veryfast OUTPUT_FILE.mp4
ffmpeg -ss 00:01:00 -to 00:02:00 -i INPUT_FILE -c:v libx264 -vf "scale=640:-1" -preset veryfast OUTPUT_FILE.mp4
To create tiny-sized thumbnails, where the priority is smallest size with just a sense of the content of the photo. It uses only 4 colors in the output photo and removes all metadata, highly compresses the file, and makes it 100x100 in size.
convert $i -strip -thumbnail "100x100" -dither FloydSteinberg -colors 4 t/$i.png