Forums / Cotonti / Core Labs / Ideas / Move configuration files to cfg folder

And leave datas for runtime-generated files only

GHengeveld
#31782 2011-12-02 20:11

Considering Kilandor's arguments here and the relation of this topic to the PFS and file-handling in Cotonti in general, I suggest the following:

  • Move config.php to /system. Put filetype/mimetype config elsewhere (see below).
  • The location of config.php should be defined in one place (probably index.php), in order to allow moving the file elsewhere (e.g. outside of document root).
  • Introduce a Filesystem API for file storage and retrieval functions, so extensions can use these instead of having to bother with the filesystem and -structure. These functions should be overridable to allow custom file handling and alternative storage locations.
  • Define mimetypes and filetypes in the Filesystem API.
  • Filesystem API should have a legacy mode or converter for old PFS storage modes (FSM or no FSM).

The reason to allow custom file storage handling is clear: default file handling isn't suitable for all sites. Currently the PFS supports storing all files in 1 folder, which is a very crude option, and it supports 'Folder Storage Mode' which gives each user its own folder and stores their uploads there. Both of these options are relatively simple and won't really scale well (talking 1000+ files here, or 10000+ for FSM). And then there is the issue of naming files. Some prefer to have filenames with a timestamp, some want the user ID in there, others just want to retain the original filename (which is tricky since users are likely to upload files with identical names). I think we should allow the admin to choose which storage handling he prefers. This also means we will need a way to move files around to allow changing storage mode afterwards.

My solution to file storage is like this:

  • Calculate a random hexadecimal hash for each file. A hash must be unique.
  • Files are renamed to their hash. This means files lose their original filename and will not have an extension.
  • Original filename, extension, hash, filepath and other relative info are stored in the database.
  • Files are stored in datas/raw
  • Within datas/raw, 16 folders are created, one for each symbol in the hexadecimal range (abcdef1234567890).
  • Files are stored in the folder which matches the first symbol in the hex filename, equally deviding files over 16 folders.
  • When the total number of files in the system exceeds a certain amount, each folder will get 16 subfolders and newly added files are stored in the subfolder matching the first and second symbols of their filename (file cf766fdc72 is stored in datas/raw/c/f/).
  • The depth of subfolders will increase as the total number of stored files increases exponentially (100, 10000, 100000000 etc.). This will ensure each folder will never contain more than 100 files or so.
  • While files will still be directly accessible, it will be extremely hard to find what you are looking for, especially since you won't know the file type. Therefore all calls to files must be handled by PHP which can lookup a file by its original name in the database and serve the correct file to the user with the original filename. This makes the underlying storage method completely invisible to the end users.

This system has several benefits:

  • It will scale very well. Millions of files is no problem.
  • Files are protected from direct access, drastically improving security and privacy.
  • File downloads will always be handled by PHP, allowing it to keep accurate access logs and other statistics.
  • A file is stored in a way that has no relation to its owner, which increases privacy and allows for easier file sharing between users.

Downsides are:

  • Reliance on database integrity. If you lose the database table of file storage locations, you'll basically have to throw away all your files too. I suggest making automatic backups of the table (index) to a special file in datas/raw.
  • Increased file access time and increased server load because PHP will have to handle all file access.
This post was edited by GHengeveld (2011-12-02 22:14, 12 years ago)