POV-Ray: Newsgroups: povray.off-topic: File name dilemma

POV-Ray : Newsgroups : povray.off-topic : File name dilemma		Server Time 5 Sep 2024 17:17:31 EDT (-0400)

Goto Latest 10 Messages

Next 2 Messages >>>

From: Tim Cook
Subject: File name dilemma
Date: 22 May 2009 18:43:51
Message: <4a172aa7$1@news.povray.org>

Progressing through my image library, I employ a scheme whereby each 
artist is in their own folder.  Logically, the folder is given the name 
of the artist (and, where possible, the years of birth and death, added 
since the massive image library I absorbed included a large number of 
those).

But that's not the problem.

The problem is the individual files.  Ideally, an image has the date 
created and its title.  Prepending the artist's name thereto is 
redundant, in my system, and possibly encounters the issue of filename 
length limits, especially when other data which was originally in the 
filename (if the image is part of an external grouping, for instance) is 
also included.  Additionally, if someone were to browse my collection 
and decide they want a particular image, but just one, the artist 
information is then lost.

What should I do?

--
Tim Cook
http://empyrean.freesitespace.net

Post a reply to this message

From: Paul Fuller
Subject: Re: File name dilemma
Date: 22 May 2009 20:00:24
Message: <4a173c98@news.povray.org>

Tim Cook wrote:
> 
> What should I do?
> 

A radical suggestion - Forget about naming and structuring the images as 
files.

Just give them arbitrary sequence numbers: 1.jpg, 2.png ...

Then tag the files with the metadata that makes them findable.  Use 
either file tag fields or create a database with the FileId, Artist, 
CreatedDate etc.  Even add a thumbnail.

Write something to suck in your existing collection from a directory 
structure and turn the path data into metadata attached to the sequence 
numbered files.

Now you can search them using any decent database front-end, search 
engine or custom application.  Reclassify them by changing the metadata 
rather than moving files between directories.

Trying to keep things in a structure by conflating names, dates, 
descriptions etc quickly leads to the problems that you are hitting.

What about a painting, piece of music or novel that has multiple 
creators?  Which name goes first?  Should you order the structure levels 
by Period, School, Style, Media, Artist?  What about if the same artist 
painted in different styles or media?  What about when one work includes 
or is derived from another?  Duplicate names, multiple works on the same 
subject by the same artist (Van Gogh's Sunfowers for example).

Consider if for example some of the files are linked to web pages, used 
as resources for renders etc by path and name.  Then you correct a typo 
in the artist name or add a date of death.  Suddenly all of the links 
from multiple unknown places are broken.

Now of course if you use a file somewhere the reference to 43857.png 
isn't particularly meaningful.  So instead use a meaningful variable 
name to hold the name - RockyPlanetTexture = '43857.png'.  Or include a 
comment generated from the metadata where you make the initial reference.

"Names are fickle, Identifiers are permanent"

Post a reply to this message

From: Tim Cook
Subject: Re: File name dilemma
Date: 22 May 2009 20:51:18
Message: <4a174886$1@news.povray.org>

Paul Fuller wrote:
> "Names are fickle, Identifiers are permanent"

The problem with tagging is that either a) it stores the data in an 
external database that doesn't follow the file when you copy it to a new 
medium or b) it alters the file, changing its MD5 hash and making it not 
identical to the other copies of it on the internet (or wherever), so 
it's harder to find duplicates in large sets of files or quickly 
determine which images of an existing set are missing.

--
Tim Cook
http://empyrean.freesitespace.net

Post a reply to this message

From: Darren New
Subject: Re: File name dilemma
Date: 22 May 2009 21:08:10
Message: <4a174c7a$1@news.povray.org>

Tim Cook wrote:
> medium or b) it alters the file, changing its MD5 hash and making it not 

Store the pre-change MD5 as one of the attributes of the file. Or the name 
of the file.

-- 
   Darren New, San Diego CA, USA (PST)
   There's no CD like OCD, there's no CD I knoooow!

Post a reply to this message

From: Paul Fuller
Subject: Re: File name dilemma
Date: 23 May 2009 11:15:45
Message: <4a181321@news.povray.org>

Tim Cook wrote:
> Paul Fuller wrote:
>> "Names are fickle, Identifiers are permanent"
> 
> The problem with tagging is that either a) it stores the data in an 
> external database that doesn't follow the file when you copy it to a new 
> medium or b) it alters the file, changing its MD5 hash and making it not 
> identical to the other copies of it on the internet (or wherever), so 
> it's harder to find duplicates in large sets of files or quickly 
> determine which images of an existing set are missing.
> 
> -- 
> Tim Cook
> http://empyrean.freesitespace.net

Well if you rule out adding external data and using the file attributes 
then you are stuck with hierarchy and that is just the least useful 
thing for reasons already mentioned.

I agree with Darren's answer that you can always store the original MD5 
as an attribute.  Ditto for the original name, source website and any 
other attributes that you are interested in.

I also should have said that you should of course normalise the data. 
So for example don't tag the file with the Artist's DOB etc.

You can always generate whatever structure you want from the metadata. 
Say if you want to export selected files for some purpose where the 
database approach does not work.

The advantage is that you can generate whatever from the metadata. 
Can't say the same for the structure approach.

Post a reply to this message

From: Darren New
Subject: Re: File name dilemma
Date: 23 May 2009 14:01:13
Message: <4a1839e9@news.povray.org>

Paul Fuller wrote:
> I agree with Darren's answer that you can always store the original MD5 
> as an attribute.  Ditto for the original name, source website and any 
> other attributes that you are interested in.

If you use the original MD5 as the file name, then add the attributes to it, 
you don't even have to look at the metadata to tell if you already have the 
file. You don't have to hash it, or store the hash anywhere else.

> The advantage is that you can generate whatever from the metadata. Can't 
> say the same for the structure approach.

You can. You're just storing the metadata in the file name. If you run thru 
the full path, you can extract whatever metadata you need temporarily. Using 
a file system as a database this way is a pretty common technique. (There 
even used to be a pre-SQL database mechanism called "heirarchical database" 
that worked just like that.)

-- 
   Darren New, San Diego CA, USA (PST)
   There's no CD like OCD, there's no CD I knoooow!

Post a reply to this message

From: Tim Cook
Subject: Re: File name dilemma
Date: 23 May 2009 17:32:19
Message: <4a186b63$1@news.povray.org>

Darren New wrote:
> If you use the original MD5 as the file name, then add the attributes to 
> it, you don't even have to look at the metadata to tell if you already 
> have the file. You don't have to hash it, or store the hash anywhere else.

Ehhh, well, the software I use generates and can perform operations on a 
file's md5, but not (that I know how to do, at any rate) search for 
similar images by comparing the filename to the md5.  Also, when 
casually browsing the drive from explorer, md5-as-filename isn't the 
most user-friendly option.  My original problem was whether or not to 
include redundant data in a filename and its path...which I already do 
for manga, so that question is more or less answered for the sake of 
uniformity, as well as the fact that if I post a single image to a 
bulletin board, it only refers to the filename, not its path.

--
Tim Cook
http://empyrean.freesitespace.net

Post a reply to this message

From: Paul Fuller
Subject: Re: File name dilemma
Date: 23 May 2009 20:47:05
Message: <4a189909$1@news.povray.org>

Darren New wrote:
> 
> You can. You're just storing the metadata in the file name. If you run 
> thru the full path, you can extract whatever metadata you need 
> temporarily. Using a file system as a database this way is a pretty 
> common technique. (There even used to be a pre-SQL database mechanism 
> called "heirarchical database" that worked just like that.)
> 

Well yes - not as easily and only up to some file system imposed limit.

Some scheme to store data within the file path/name is certainly 
possible (I use it all the time).  As the amount of metadata grows 
though it becomes cumbersome and moving to a proper database makes sense.

Such a scheme is not assisted by the OS or most tools and utilities. 
For example I could easily botch part of the path by swapping the 3rd 
and 7th elements.  Or format dates wrongly in the 4th element.  Nothing 
helps me to keep the scheme in place.

Actually I could store the content of a file in its filename (up to some 
limit) and then just make the file have the EOF character in it.  If it 
is a binary file then just ASCII encode it!  Extreme case.

Eg. C:\Shopping List\Groceries\20090524\Sunday\Tea, Biscuits (sweet), 4 
Oranges, Fresh Corn on the cob (or canned if none available) and so on 
until the OS won't let me continue.txt

Whereas if I put the metadata into database fields or file attributes 
then some validation is enforced by the field type.  There are well 
supported concepts like sorting and searching.

Since you bring up hierarchical databases, I note that the relational 
model has pretty much won that debate.  Sure there are still examples of 
H databases but I think you would agree that R is dominant today?  And 
for the same reasons that I advocated the approach to Tim.

Relational can't solve all problems but it is generally a closer 
representation of the data, easier to manipulate, less redundancy, 
better supported by tools ...

Post a reply to this message

From: Darren New
Subject: Re: File name dilemma
Date: 23 May 2009 21:23:04
Message: <4a18a178$1@news.povray.org>

Paul Fuller wrote:
> Well yes - not as easily and only up to some file system imposed limit.

Sure.

> Whereas if I put the metadata into database fields or file attributes 
> then some validation is enforced by the field type.  There are well 
> supported concepts like sorting and searching.

Yep. And Windows at least does the indexing for you if you put the tags in 
the metadata of the file, letting you do searches and editing of metadata 
entirely within the tools that come with the system. (Which can be annoying 
sometimes if that's not what you're trying to search. :-)

> Since you bring up hierarchical databases, I note that the relational 
> model has pretty much won that debate.  Sure there are still examples of 
> H databases but I think you would agree that R is dominant today?  And 
> for the same reasons that I advocated the approach to Tim.

I wasn't trying to argue that he should use a non-relational database. I was 
  simply saying that (contrary to what some of the younger generation might 
know) relational databases are at the end of the evolutionary chain, not the 
start.

> Relational can't solve all problems but it is generally a closer 
> representation of the data, easier to manipulate, less redundancy, 
> better supported by tools ...

Yes to all that. I don't even know of any existing hierarchical databases 
*other* than UNIX-style file systems.

-- 
   Darren New, San Diego CA, USA (PST)
   There's no CD like OCD, there's no CD I knoooow!

Post a reply to this message

From: Nicolas Alvarez
Subject: Re: File name dilemma
Date: 24 May 2009 01:19:28
Message: <4a18d8df@news.povray.org>

Tim Cook wrote:
> Paul Fuller wrote:
>> "Names are fickle, Identifiers are permanent"
> 
> The problem with tagging is that either a) it stores the data in an
> external database that doesn't follow the file when you copy it to a new
> medium or b) it alters the file, changing its MD5 hash and making it not
> identical to the other copies of it on the internet (or wherever), so
> it's harder to find duplicates in large sets of files or quickly
> determine which images of an existing set are missing.

for a) If you copy all the files to a new medium, copy the database along.
If you send just one file to someone, use a script that reads metadata from
the database and puts it on the filename before sending.

for b) Other people may also use tagging, or recompress a JPEG image, or
[...] which would already make your pictures have a different MD5 from
theirs. Use a real image comparison algorithm if you want to compare.

And in Windows, a lot of metadata may be stored in "alternate data streams"
instead of EXIF or something similar inside the file itself.

Post a reply to this message

Goto Latest 10 Messages

Next 2 Messages >>>