It’s common to need to organize large filesets (pictures, papers, etc.). There are specialized programs, but they can get hairy quickly, and simple prototypes are easy to think of, e.g.: have a special directory containing one subdirectory per tag, and use ln(1) to tag files, by storing links under such directories. There are a few drawbacks, for example:
- Not resilient to file renaming (or requires at least to duplicate files);
- Some queries (e.g.
AND
) are sligthtly clunky to write.
There’s another, more modest approach, solving those issues at a minor
cost: encode the tags in the filename, e.g.:
tag0_tag1_-___-<original filename>
.
The tag separator (_
) is arbitrary; the separator between the tags
and the (original) filename (_-___-
; looks like a -___-
but
conveniently doesn’t start with a -
) should be unique enough. So,
we trade an “ugly” filename and a few false positive (expected to be
practically insignificant at worse) for a tagging system which is:
- file system independent (almost: “long” filenames can be an issue on old stuff);
- reasonably OS-independent;
- resilient to renaming;
- doesn’t require to install/update/maintain additional software;
- code-free (almost: some bits can be helpful, e.g. to ease automated processing);
- trivial to migrate away from if need be;
- etc.
Arbitrary queries can be performed:
OR
:... | grep '^(tag0|tag1)_-___-
;AND
:... | grep '^tag0.*tag1.*tag5.*_-___-'
(just sort the tags alphabetically);NOT
:... | grep -v 'tag0.*_-___-'
;
Finally, locate(1)/updatedb(8) naturally provides a central access to all tagged files, on a per-disk basis though: by default, it only manages files on the root file system:
$ updatedb
$ locate _-___-
...
If the files are on an external hard drive, you’ll “have to” create a dedicated database (and perhaps write a small wrapper to ease querying):
$ updatedb -l 0 -o ~/.tmp0-updatedb.db -U /mnt/tmp0/
$ locate -d ~/.*.db _-___-
$ locate -d ~/.*.db _-___-|wc -l
199
For the record, most modern file systems support 255-bytes long filenames, for instance:
$ getconf NAME_MAX /
255
$ mount |grep ' / '
/dev/sda6 on / type ext4 (rw,relatime)
A Huffman coding-like, with printable characters instead of bits, could be used to compress things further, if need be. This can be performed approximatively by hand to a reasonable degree.
As I need to (manually) tag dense directories (c. 4000 files), I’ve wrote two small scripts to help with automated processing:
- The first allows to add/remove tags to a file (trim duplicates, alphabetical sort);
- The second relies on the first one to provide a way to tag batches of files stored in a single directory.
#!/bin/sh
set -e
# <tags>$sep<filename>
sep='_-___-'
# <tags>=<tag0>$tsep<tag1>$tsep...
# XXX assumed to be one byte long later
tsep='_'
if [ -z "$1" ]; then
echo `basename $0` '[-dr] <path/to/file>' '[tags]' 1>&2
exit 1
fi
dryrun=mv
rm=
while getopts "dr" opt; do
case "$opt" in
d) dryrun="echo mv";;
r) rm=1;;
esac
done
shift $((OPTIND-1))
if echo $1 | grep -q '[ ]'; then
echo "No spaces allowed in filename: '$1'" 1>&2
exit 1
fi
f=$1
shift; tags="$@"
tags2lines() {
tr -s ''$tsep' \t' '\n'
}
lines2tags() {
# XXX 2 = 1+length($sep)
sort -u | awk '{ s = s "'$tsep'" $1 } END{ print substr(s, 2) }'
}
d=`dirname $f`
basename $f | awk -F"$sep" '{
if ($2 == "") printf("\n%s\n", $1);
else printf("%s\n%s\n", $1, $2); }' | {
read ts; read fn
if [ -z "$rm" ]; then
tags=$(echo $tags $ts | tags2lines | lines2tags)
else
tags=$(echo $tags | sed 's,['$tsep' ]\+,\\\|,g')
tags=$(echo $ts | tags2lines | grep -v "$tags" | lines2tags)
fi
if [ "$f" != "$d/$tags$sep$fn" ]; then
$dryrun $f $d/$tags$sep$fn
fi
}
#!/bin/sh
# e.g.: sh batchtag -s "feh -." /home/$USER/photos/
# TODO: batch tag renaming
set -e
# <tags>$sep<filename>
sep='_-___-'
# <tags>=<tag0>$tsep<tag1>$tsep...
# XXX assumed to be one byte in ./tag
tsep='_'
PATH=.:$PATH:
if ! which tag >/dev/null; then
echo 'tag(1) not found in $PATH' 1>&2
exit 1
fi
skip=
if [ "$1" = "-s" ]; then skip=1; shift; fi
# e.g. "feh -." for viewing images, "xpdf" for .pdfs, etc.
if [ -z "$2" ]; then
echo `basename $0` '[-s] <prog> <path/to/dir/>' 1>&2
exit 1
fi
for x in $2/*; do
if [ -n "$skip" ] && echo $x | grep -q -- $sep; then
continue
fi
$1 $x &
pid=$!
echo $x; echo -n "new set of tags: ";
# assume ^D
if ! read ts; then kill $pid; exit 0; fi
# $pid may have been killed already
set +e; kill $pid; set -e
# Empty string is a no-op
if [ "$ts" != "" ]; then tag $x $ts; fi
done
Comments
By email, at mathieu.bivert chez: