|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Progressing through my image library, I employ a scheme whereby each
artist is in their own folder. Logically, the folder is given the name
of the artist (and, where possible, the years of birth and death, added
since the massive image library I absorbed included a large number of
those).
But that's not the problem.
The problem is the individual files. Ideally, an image has the date
created and its title. Prepending the artist's name thereto is
redundant, in my system, and possibly encounters the issue of filename
length limits, especially when other data which was originally in the
filename (if the image is part of an external grouping, for instance) is
also included. Additionally, if someone were to browse my collection
and decide they want a particular image, but just one, the artist
information is then lost.
What should I do?
--
Tim Cook
http://empyrean.freesitespace.net
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Tim Cook wrote:
>
> What should I do?
>
A radical suggestion - Forget about naming and structuring the images as
files.
Just give them arbitrary sequence numbers: 1.jpg, 2.png ...
Then tag the files with the metadata that makes them findable. Use
either file tag fields or create a database with the FileId, Artist,
CreatedDate etc. Even add a thumbnail.
Write something to suck in your existing collection from a directory
structure and turn the path data into metadata attached to the sequence
numbered files.
Now you can search them using any decent database front-end, search
engine or custom application. Reclassify them by changing the metadata
rather than moving files between directories.
Trying to keep things in a structure by conflating names, dates,
descriptions etc quickly leads to the problems that you are hitting.
What about a painting, piece of music or novel that has multiple
creators? Which name goes first? Should you order the structure levels
by Period, School, Style, Media, Artist? What about if the same artist
painted in different styles or media? What about when one work includes
or is derived from another? Duplicate names, multiple works on the same
subject by the same artist (Van Gogh's Sunfowers for example).
Consider if for example some of the files are linked to web pages, used
as resources for renders etc by path and name. Then you correct a typo
in the artist name or add a date of death. Suddenly all of the links
from multiple unknown places are broken.
Now of course if you use a file somewhere the reference to 43857.png
isn't particularly meaningful. So instead use a meaningful variable
name to hold the name - RockyPlanetTexture = '43857.png'. Or include a
comment generated from the metadata where you make the initial reference.
"Names are fickle, Identifiers are permanent"
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Paul Fuller wrote:
> "Names are fickle, Identifiers are permanent"
The problem with tagging is that either a) it stores the data in an
external database that doesn't follow the file when you copy it to a new
medium or b) it alters the file, changing its MD5 hash and making it not
identical to the other copies of it on the internet (or wherever), so
it's harder to find duplicates in large sets of files or quickly
determine which images of an existing set are missing.
--
Tim Cook
http://empyrean.freesitespace.net
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Tim Cook wrote:
> medium or b) it alters the file, changing its MD5 hash and making it not
Store the pre-change MD5 as one of the attributes of the file. Or the name
of the file.
--
Darren New, San Diego CA, USA (PST)
There's no CD like OCD, there's no CD I knoooow!
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Tim Cook wrote:
> Paul Fuller wrote:
>> "Names are fickle, Identifiers are permanent"
>
> The problem with tagging is that either a) it stores the data in an
> external database that doesn't follow the file when you copy it to a new
> medium or b) it alters the file, changing its MD5 hash and making it not
> identical to the other copies of it on the internet (or wherever), so
> it's harder to find duplicates in large sets of files or quickly
> determine which images of an existing set are missing.
>
> --
> Tim Cook
> http://empyrean.freesitespace.net
Well if you rule out adding external data and using the file attributes
then you are stuck with hierarchy and that is just the least useful
thing for reasons already mentioned.
I agree with Darren's answer that you can always store the original MD5
as an attribute. Ditto for the original name, source website and any
other attributes that you are interested in.
I also should have said that you should of course normalise the data.
So for example don't tag the file with the Artist's DOB etc.
You can always generate whatever structure you want from the metadata.
Say if you want to export selected files for some purpose where the
database approach does not work.
The advantage is that you can generate whatever from the metadata.
Can't say the same for the structure approach.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Paul Fuller wrote:
> I agree with Darren's answer that you can always store the original MD5
> as an attribute. Ditto for the original name, source website and any
> other attributes that you are interested in.
If you use the original MD5 as the file name, then add the attributes to it,
you don't even have to look at the metadata to tell if you already have the
file. You don't have to hash it, or store the hash anywhere else.
> The advantage is that you can generate whatever from the metadata. Can't
> say the same for the structure approach.
You can. You're just storing the metadata in the file name. If you run thru
the full path, you can extract whatever metadata you need temporarily. Using
a file system as a database this way is a pretty common technique. (There
even used to be a pre-SQL database mechanism called "heirarchical database"
that worked just like that.)
--
Darren New, San Diego CA, USA (PST)
There's no CD like OCD, there's no CD I knoooow!
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New wrote:
> If you use the original MD5 as the file name, then add the attributes to
> it, you don't even have to look at the metadata to tell if you already
> have the file. You don't have to hash it, or store the hash anywhere else.
Ehhh, well, the software I use generates and can perform operations on a
file's md5, but not (that I know how to do, at any rate) search for
similar images by comparing the filename to the md5. Also, when
casually browsing the drive from explorer, md5-as-filename isn't the
most user-friendly option. My original problem was whether or not to
include redundant data in a filename and its path...which I already do
for manga, so that question is more or less answered for the sake of
uniformity, as well as the fact that if I post a single image to a
bulletin board, it only refers to the filename, not its path.
--
Tim Cook
http://empyrean.freesitespace.net
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New wrote:
>
> You can. You're just storing the metadata in the file name. If you run
> thru the full path, you can extract whatever metadata you need
> temporarily. Using a file system as a database this way is a pretty
> common technique. (There even used to be a pre-SQL database mechanism
> called "heirarchical database" that worked just like that.)
>
Well yes - not as easily and only up to some file system imposed limit.
Some scheme to store data within the file path/name is certainly
possible (I use it all the time). As the amount of metadata grows
though it becomes cumbersome and moving to a proper database makes sense.
Such a scheme is not assisted by the OS or most tools and utilities.
For example I could easily botch part of the path by swapping the 3rd
and 7th elements. Or format dates wrongly in the 4th element. Nothing
helps me to keep the scheme in place.
Actually I could store the content of a file in its filename (up to some
limit) and then just make the file have the EOF character in it. If it
is a binary file then just ASCII encode it! Extreme case.
Eg. C:\Shopping List\Groceries\20090524\Sunday\Tea, Biscuits (sweet), 4
Oranges, Fresh Corn on the cob (or canned if none available) and so on
until the OS won't let me continue.txt
Whereas if I put the metadata into database fields or file attributes
then some validation is enforced by the field type. There are well
supported concepts like sorting and searching.
Since you bring up hierarchical databases, I note that the relational
model has pretty much won that debate. Sure there are still examples of
H databases but I think you would agree that R is dominant today? And
for the same reasons that I advocated the approach to Tim.
Relational can't solve all problems but it is generally a closer
representation of the data, easier to manipulate, less redundancy,
better supported by tools ...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Paul Fuller wrote:
> Well yes - not as easily and only up to some file system imposed limit.
Sure.
> Whereas if I put the metadata into database fields or file attributes
> then some validation is enforced by the field type. There are well
> supported concepts like sorting and searching.
Yep. And Windows at least does the indexing for you if you put the tags in
the metadata of the file, letting you do searches and editing of metadata
entirely within the tools that come with the system. (Which can be annoying
sometimes if that's not what you're trying to search. :-)
> Since you bring up hierarchical databases, I note that the relational
> model has pretty much won that debate. Sure there are still examples of
> H databases but I think you would agree that R is dominant today? And
> for the same reasons that I advocated the approach to Tim.
I wasn't trying to argue that he should use a non-relational database. I was
simply saying that (contrary to what some of the younger generation might
know) relational databases are at the end of the evolutionary chain, not the
start.
> Relational can't solve all problems but it is generally a closer
> representation of the data, easier to manipulate, less redundancy,
> better supported by tools ...
Yes to all that. I don't even know of any existing hierarchical databases
*other* than UNIX-style file systems.
--
Darren New, San Diego CA, USA (PST)
There's no CD like OCD, there's no CD I knoooow!
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Tim Cook wrote:
> Paul Fuller wrote:
>> "Names are fickle, Identifiers are permanent"
>
> The problem with tagging is that either a) it stores the data in an
> external database that doesn't follow the file when you copy it to a new
> medium or b) it alters the file, changing its MD5 hash and making it not
> identical to the other copies of it on the internet (or wherever), so
> it's harder to find duplicates in large sets of files or quickly
> determine which images of an existing set are missing.
for a) If you copy all the files to a new medium, copy the database along.
If you send just one file to someone, use a script that reads metadata from
the database and puts it on the filename before sending.
for b) Other people may also use tagging, or recompress a JPEG image, or
[...] which would already make your pictures have a different MD5 from
theirs. Use a real image comparison algorithm if you want to compare.
And in Windows, a lot of metadata may be stored in "alternate data streams"
instead of EXIF or something similar inside the file itself.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|