Forums / Cotonti / Core Labs / Ideas / Move configuration files to cfg folder

And leave datas for runtime-generated files only

Trustmaster
#1 2011-12-02 16:27

Currently we have (basicly) 3 executable PHP files in datas folder: config.php, extensions.php and mimetype.php. The rest of datas contents is user files, uploads and temporary files.

There is a proposal in the community to put configuration files into a separate root folder named 'cfg'. The reasoning behind such a measure is:

  1. It is more logical to keep configuration apart from user-generated contents.
  2. It would simplify security/rights management in datas. You could just make the enitre folder writeable recursively. What's more, it would be possible to disallow PHP execution in datas (at least with Apache) which would improve security.

The downside is backwards compatibility. Despite path to config.php is hardcoded, it is quite easy to replace massively. But the files should be moved to a new folder manually on existing sites because install.php isn't likely to have enough permissions.

Post here what you think and some alternative solutions.

May the Source be with you!
GHengeveld
#2 2011-12-02 20:11

Considering Kilandor's arguments here and the relation of this topic to the PFS and file-handling in Cotonti in general, I suggest the following:

  • Move config.php to /system. Put filetype/mimetype config elsewhere (see below).
  • The location of config.php should be defined in one place (probably index.php), in order to allow moving the file elsewhere (e.g. outside of document root).
  • Introduce a Filesystem API for file storage and retrieval functions, so extensions can use these instead of having to bother with the filesystem and -structure. These functions should be overridable to allow custom file handling and alternative storage locations.
  • Define mimetypes and filetypes in the Filesystem API.
  • Filesystem API should have a legacy mode or converter for old PFS storage modes (FSM or no FSM).

The reason to allow custom file storage handling is clear: default file handling isn't suitable for all sites. Currently the PFS supports storing all files in 1 folder, which is a very crude option, and it supports 'Folder Storage Mode' which gives each user its own folder and stores their uploads there. Both of these options are relatively simple and won't really scale well (talking 1000+ files here, or 10000+ for FSM). And then there is the issue of naming files. Some prefer to have filenames with a timestamp, some want the user ID in there, others just want to retain the original filename (which is tricky since users are likely to upload files with identical names). I think we should allow the admin to choose which storage handling he prefers. This also means we will need a way to move files around to allow changing storage mode afterwards.

My solution to file storage is like this:

  • Calculate a random hexadecimal hash for each file. A hash must be unique.
  • Files are renamed to their hash. This means files lose their original filename and will not have an extension.
  • Original filename, extension, hash, filepath and other relative info are stored in the database.
  • Files are stored in datas/raw
  • Within datas/raw, 16 folders are created, one for each symbol in the hexadecimal range (abcdef1234567890).
  • Files are stored in the folder which matches the first symbol in the hex filename, equally deviding files over 16 folders.
  • When the total number of files in the system exceeds a certain amount, each folder will get 16 subfolders and newly added files are stored in the subfolder matching the first and second symbols of their filename (file cf766fdc72 is stored in datas/raw/c/f/).
  • The depth of subfolders will increase as the total number of stored files increases exponentially (100, 10000, 100000000 etc.). This will ensure each folder will never contain more than 100 files or so.
  • While files will still be directly accessible, it will be extremely hard to find what you are looking for, especially since you won't know the file type. Therefore all calls to files must be handled by PHP which can lookup a file by its original name in the database and serve the correct file to the user with the original filename. This makes the underlying storage method completely invisible to the end users.

This system has several benefits:

  • It will scale very well. Millions of files is no problem.
  • Files are protected from direct access, drastically improving security and privacy.
  • File downloads will always be handled by PHP, allowing it to keep accurate access logs and other statistics.
  • A file is stored in a way that has no relation to its owner, which increases privacy and allows for easier file sharing between users.

Downsides are:

  • Reliance on database integrity. If you lose the database table of file storage locations, you'll basically have to throw away all your files too. I suggest making automatic backups of the table (index) to a special file in datas/raw.
  • Increased file access time and increased server load because PHP will have to handle all file access.

Dit bericht is bewerkt door GHengeveld (2011-12-02 22:14, 12 jaren ago)
esclkm
#3 2011-12-03 06:45

I think Cfg  folder is good variant

littledev.ru - мой маленький зарождающийся блог о котонти.
снижение стоимости программирования и снижение стоимости производства разные вещи. Первое можно скорее сравнить с раздачей работникам дешевых инструментов, чем со снижением зарплаты
Trustmaster
#4 2011-12-04 12:39

Gert, that sounds way too difficult even for me, not to mention the others :)

May the Source be with you!
GHengeveld
#5 2011-12-04 19:57

Okay, but I still think a generic API for file storage is a welcome addition.

Actually, I've been working on the storage method I described for a while and it isn't that complicated, except for the exponential part (which is optional).

Twiebie
#6 2011-12-05 04:34

May I add that I think it's best if filenames will keep their original filename? Ofcourse something can/should be added to the filename to make it unique, but I think it's better for users if they can actually see the name of a file they are downloading. Something like cf766fdc72.rar doesn't really say much to the average person that's looking for a file.

In the method you described, it will put the original filename in the database, so it could be displayed in PFS. But that would still leave the problem of random generated filenames that in my opinion are just not as nice as actual filenames (linking to a file etc...).

GHengeveld
#7 2011-12-05 08:40
PHP will change back the file name before returning the file to the user, so the user will never see the random file name, even when he downloads it. This can be done using header content disposition attachment filename (Google it).
esclkm
#8 2011-12-05 12:22

GHengeveld - I miss understand about renaming files - i think its very bad tone

littledev.ru - мой маленький зарождающийся блог о котонти.
снижение стоимости программирования и снижение стоимости производства разные вещи. Первое можно скорее сравнить с раздачей работникам дешевых инструментов, чем со снижением зарплаты
GHengeveld
#9 2011-12-05 22:22

Ok, nevermind my argument towars a generic file API. I'll just continue to work on a seperate module which incorporates these features.

I'm fine with a cfg subfolder, even though I think the root is getting a bit too crowded (I think routing most of the stuff through index.php will solve that).

Kilandor
#10 2011-12-07 22:57

Well I think a file api would be fine and a good idea as a basis. I think some of the things described are a bit overkill and will not matter to a majority of users.

I would also go so far as to take the base of the idea so that the API would do its folder structure based upon the module/plugin. So say I have a plugin for image uploads and then another for Videos. It would be like datas/raw/images or datas/raw/video . This will allow for quick cleanup say on removal.

I started to think the api should not have a database and the module/plugin should handle it but I changed my mind. The api should have a database. It should also include identifiable to mod/plug and user as well. So the files can still be identifiable on the backend but need not be for the actual file such as is now 1-Foo.png

I do not think that we need expoential folders and limit to the number of files in these. It would require tracking or something else. Just for example Mediawiki simply uses the first 2 characters from the MD5 of the file name. It does not expoentially have folders

Of course we can use PHP hashing system to hash the uploaded file completly. Relative to size of the file and what not this could very easily fail on servers/hosts depending on the size of the file.

So I think hashing off the file name is the only logical choice. This would still let you know the path/name though. So likely appending the timestamp to the filename simply for the purpose of generating the hash would be appropriate.

I think this should be a seperate issue for the file handling, I think simply for the task at and and per the topic that the cfg should be moved. This way things like extension/mimetype can have our cotonti system defaults, which can be overridden by or added to by custom configs in the cfg folder.