data:image/s3,"s3://crabby-images/2899d/2899d9c527e9e0fbca5ce8f776b3874fe35791b2" alt="2018, Toulouse, France"
2018, Toulouse, France
by
M. Bivert
It’s common to need to organize large filesets (pictures, papers, etc.). There are specialized programs, but they can get hairy quickly, and simple prototypes are easy to think of, e.g.: have a special directory containing one subdirectory per tag, and use ln(1) to tag files, by storing links under such directories. There are a few drawbacks, for example:
- Not resilient to file renaming (or requires at least to duplicate files);
- Some queries (e.g.
AND
) are sligthtly clunky to write.
There’s another, more modest approach, solving those issues at a minor
cost: encode the tags in the filename, e.g.:
tag0_tag1_-___-<original filename>
.
The tag separator (_
) is arbitrary; the separator between the tags
and the (original) filename (_-___-
; looks like a -___-
but
conveniently doesn’t start with a -
) should be unique enough. So,
we trade an “ugly” filename and a few false positive (expected to be
practically insignificant at worse) for a tagging system which is:
- file system independent (almost: “long” filenames can be an issue on old stuff);
- reasonably OS-independent;
- resilient to renaming;
- doesn’t require to install/update/maintain additional software;
- code-free (almost: some bits can be helpful, e.g. to ease automated processing);
- trivial to migrate away from if need be;
- etc.
Arbitrary queries can be performed:
OR
:... | grep '^(tag0|tag1)_-___-
;AND
:... | grep '^tag0.*tag1.*tag5.*_-___-'
(just sort the tags alphabetically);NOT
:... | grep -v 'tag0.*_-___-'
;
Finally, locate(1)/updatedb(8) naturally provides a central access to all tagged files, on a per-disk basis though: by default, it only manages files on the root file system:
$ updatedb
$ locate _-___-
...
If the files are on an external hard drive, you’ll “have to” create a dedicated database (and perhaps write a small wrapper to ease querying):
$ updatedb -l 0 -o ~/.tmp0-updatedb.db -U /mnt/tmp0/
$ locate -d ~/.*.db _-___-
$ locate -d ~/.*.db _-___-|wc -l
199
For the record, most modern file systems support 255-bytes long filenames, for instance:
$ getconf NAME_MAX /
255
$ mount |grep ' / '
/dev/sda6 on / type ext4 (rw,relatime)
A Huffman coding-like, with printable characters instead of bits, could be used to compress things further, if need be. This can be performed approximatively by hand to a reasonable degree.
data:image/s3,"s3://crabby-images/8fecd/8fecde36d32163c53ef8c4ec159ecc2edc988d11" alt="False garlic, 2018, Toulouse, France"
False garlic, 2018, Toulouse, France
by
M. Bivert
As I need to (manually) tag dense directories (c. 4000 files), I’ve wrote two small scripts to help with automated processing:
- The first allows to add/remove tags to a file (trim duplicates, alphabetical sort);
- The second relies on the first one to provide a way to tag batches of files stored in a single directory.
#!/bin/sh
set -e
# <tags>$sep<filename>
sep='_-___-'
# <tags>=<tag0>$tsep<tag1>$tsep...
# XXX assumed to be one byte long later
tsep='_'
if [ -z "$1" ]; then
echo `basename $0` '[-dr] <path/to/file>' '[tags]' 1>&2
exit 1
fi
dryrun=mv
rm=
while getopts "dr" opt; do
case "$opt" in
d) dryrun="echo mv";;
r) rm=1;;
esac
done
shift $((OPTIND-1))
if echo $1 | grep -q '[ ]'; then
echo "No spaces allowed in filename: '$1'" 1>&2
exit 1
fi
f=$1
shift; tags="$@"
tags2lines() {
tr -s ''$tsep' \t' '\n'
}
lines2tags() {
# XXX 2 = 1+length($sep)
sort -u | awk '{ s = s "'$tsep'" $1 } END{ print substr(s, 2) }'
}
d=`dirname $f`
basename $f | awk -F"$sep" '{
if ($2 == "") printf("\n%s\n", $1);
else printf("%s\n%s\n", $1, $2); }' | {
read ts; read fn
if [ -z "$rm" ]; then
tags=$(echo $tags $ts | tags2lines | lines2tags)
else
tags=$(echo $tags | sed 's,['$tsep' ]\+,\\\|,g')
tags=$(echo $ts | tags2lines | grep -v "$tags" | lines2tags)
fi
if [ "$f" != "$d/$tags$sep$fn" ]; then
$dryrun $f $d/$tags$sep$fn
fi
}
#!/bin/sh
# e.g.: sh batchtag -s "feh -." /home/$USER/photos/
# TODO: batch tag renaming
set -e
# <tags>$sep<filename>
sep='_-___-'
# <tags>=<tag0>$tsep<tag1>$tsep...
# XXX assumed to be one byte in ./tag
tsep='_'
PATH=.:$PATH:
if ! which tag >/dev/null; then
echo 'tag(1) not found in $PATH' 1>&2
exit 1
fi
skip=
if [ "$1" = "-s" ]; then skip=1; shift; fi
# e.g. "feh -." for viewing images, "xpdf" for .pdfs, etc.
if [ -z "$2" ]; then
echo `basename $0` '[-s] <prog> <path/to/dir/>' 1>&2
exit 1
fi
for x in $2/*; do
if [ -n "$skip" ] && echo $x | grep -q -- $sep; then
continue
fi
$1 $x &
pid=$!
echo $x; echo -n "new set of tags: ";
# assume ^D
if ! read ts; then kill $pid; exit 0; fi
# $pid may have been killed already
set +e; kill $pid; set -e
# Empty string is a no-op
if [ "$ts" != "" ]; then tag $x $ts; fi
done
Comments
By email, at mathieu.bivert chez: