Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

File Storage

Options
  • 02-02-2012 12:53pm
    #1
    Registered Users Posts: 586 ✭✭✭


    Hi guys,

    Hopefully I can get a bit of direction for a project I've just started. I'm going to develop a web app where users need to be able to upload and download BLOBs such as image files, document files and spreadsheet files. I'm aware that databases can store files like this in the same way they store small data structures. In researching this on the web it seems that files greater than 1MB are best stored on file systems as opposed to databases. I'm expecting the majority of files to be less than 1MB but some could span to 30 - 40MB or higher and I don't think storing these on the database will be good for performance. I'm not really following how storing in a file system works so maybe you could answer a few questions regarding it.
    1. Does the application specify a directory and the application writes to the specified directory on the server and stores the file metadata in the database?
    2. How would this function on a shared hosting environment?
    3. How is the file system security and backup handled in this case for both shared and dedicated environments?
    4. I don't want to look at going for a paid solution but is the option of using a file share system such as dropbox or sharepoint necessary/viable?
    5. SQL Server 2008 stores Filestream data (database handles the storage of BLOBs) but does it store the files in the database or does it handle the work in saving the file to an external file system? Also is using functionality like Filestream a good approach?

    I expect that at most a maximum of 5GB of files would be uploaded in a year where maybe 100 files are 30 - 40MB in size and most other would be less than or around 1MB and some would be in the 5 - 10MB range.

    My ideal solution would be to just put the BLOBs into a database like I would any other data.

    Also is there libraries available for handling these sort of upload/download transactions for ASP.NET MVC and is freely licensed but not copyleft (e.g. GPL).


Comments

  • Registered Users Posts: 26,574 ✭✭✭✭Creamy Goodness


    store in directories ALWAYS...

    store filename and or path in the db. doesn't matter about shared/dedicated hosting when it comes to backup, YOU'RE RESPONSIBLE FOR YOUR APPLICATION'S DATA. If you use shared hosting treat it as if it could be wiped within 30 seconds and have a back up strategy that suits this.


  • Moderators, Society & Culture Moderators Posts: 9,689 Mod ✭✭✭✭stevenmu


    4. Dropbox or SharePoint could potentially work, if you want to use them as simple file stores you'll need to write code to interact with their web services etc.

    If SharePoint is an option, it might be easier to just do everything through that, rather than write your own web app.

    5. SQL Server 2008 (I believe it needs to be R2), supports Remote Blob Storage (RBS), but if you're storing your database with a shared host it's unlikely that they'll support it.


  • Registered Users Posts: 586 ✭✭✭Aswerty


    store in directories ALWAYS...

    store filename and or path in the db. doesn't matter about shared/dedicated hosting when it comes to backup, YOU'RE RESPONSIBLE FOR YOUR APPLICATION'S DATA. If you use shared hosting treat it as if it could be wiped within 30 seconds and have a back up strategy that suits this.

    When documents are less than 256KB SQL Server gives better performance than NTFS for file read/writes. Between 256KB and 1MB depending on fragmentation of data SQL Server can still give better performance. So always writing to directories doesn't hold true. In my case most files will be under 1MB but a non-insignificant number are greater and some considerably so. This means storing metadata in the database and the file in a directory is probably the best approach in this case.

    With regards to your comments on backing up, I was asking what issues are raised in regards to security and backup on hosted and dedicated servers where files are being stored external to the database. The first thing that comes to mind is a loss of integrity since files are now external to the database and are freely mutable to anyone with access to the file system. I'm well aware that I'm responsible for my own data.
    4. Dropbox or SharePoint could potentially work, if you want to use them as simple file stores you'll need to write code to interact with their web services etc.

    If SharePoint is an option, it might be easier to just do everything through that, rather than write your own web app.

    5. SQL Server 2008 (I believe it needs to be R2), supports Remote Blob Storage (RBS), but if you're storing your database with a shared host it's unlikely that they'll support it.

    I don't have access to SharePoint, I was just looking to see if these tools were a reasonable solution to storing BLOBs for a web app.

    Thanks for the heads up on RBS, I hadn't come across it before when looking at Filestream data. I'd agree that shared hosts probably wouldn't accommodate it. This is a pity since it looked like a nice solution for me. I'm the sole developer in a non software based SME so my expectation is I will end up deploying with a shared host. I'd previously looked at Amazon Micro Instance Servers but I think the application is too demanding and the cost to upgrade to a small server is significant enough to shy away from it.

    Anyways I've been looking up how to approach uploading and downloading files with ASP.NET MVC and things look pretty straight forward. I just have to get up to speed on any security and backup issues with regard to having my data split between a file system and database.

    I've only ever dealt with structured data when persisting data on the server side so it's a bit of a learning curve.


  • Banned (with Prison Access) Posts: 3,455 ✭✭✭krd


    I've noticed there is a data type on MySql - the LONGBLOB. It can hold 4GB

    I'm not sure, but I would assume, the table points to the location of binary, and doesn't store the complete binary in the table, which might cause performance problems. I don't really know. I haven't tried it. Would downloading a 4GB file from MySql cause the server to grind to a halt?

    If there's not huge traffic on the site, would it really make any difference, whether you used the BLOB or the file name?


  • Registered Users Posts: 586 ✭✭✭Aswerty


    From what I see LONGBLOB is is similar to varbinary(max) on SQL Server which stores up to 2GB. So I expect the binary is stored in the database just like varbinary(max) is. The Filestream storage mentioned above allows the database to integrate with the file system so the database only needs to hold the filepath. Based on this StackOverflow answer as of Feb 2010 MySQL has no equivalent of Filestream.


  • Advertisement
  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    What would be the advantages to holding objects like these in a Database. Seem to me to cause more problems than it solves. A secure directory storage system, seems to have all the advantages and none of the disadvantages.


  • Registered Users Posts: 9,294 ✭✭✭markpb


    BostonB wrote: »
    What would be the advantages to holding objects like these in a Database. Seem to me to cause more problems than it solves. A secure directory storage system, seems to have all the advantages and none of the disadvantages.

    Security, integrity, auditing, single backup...


  • Moderators, Society & Culture Moderators Posts: 9,689 Mod ✭✭✭✭stevenmu


    Keep in mind that on shared hosting, your database will be hosted on another shared DB server, so performance won't be fantastic. Also, your DB storage allocation is usually seperate from your file storage allocation. So even if you have say a 50GB file storage allocation, you may still be limited to 2GB per DB.


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    markpb wrote: »
    Security, integrity, auditing, single backup...

    All of those are relatively easy to overcome though no? I mean having multiple backups is all automated. Its not really an issue.


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    stevenmu wrote: »
    Keep in mind that on shared hosting, your database will be hosted on another shared DB server, so performance won't be fantastic. Also, your DB storage allocation is usually seperate from your file storage allocation. So even if you have say a 50GB file storage allocation, you may still be limited to 2GB per DB.

    I would have assumed the database would perform a lot quicker without all those objects in it. The speed of access to the files would be dependent on connection speed and I/O to the file server, the application speed would be independent of that if it was on another database and/or server. I just imagine if you've a could people uploading/downloading a big file how would that effect other users on the system. Off loading I/O load to other machines and/or processing would seem to be a good way of parallel processing.

    I have no experience of it, and none of the practicality of the limits of the hosting. Just idle musing on my part. Perhaps database and application speed isn't effected at all.


  • Advertisement
  • Registered Users Posts: 9,294 ✭✭✭markpb


    BostonB wrote: »
    All of those are relatively easy to overcome though no? I mean having multiple backups is all automated. Its not really an issue.

    True but if you can do it all in the DB with adequate performance, why put work on yourself? Why configure security twice, auditing twice, backups twice, etc. And on top of that, you have the added difficulty in ensuring that the DB and filesystem are consistent with each other.


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    There wouldn't be if there's adequate performance.


  • Banned (with Prison Access) Posts: 3,455 ✭✭✭krd


    BostonB wrote: »
    I would have assumed the database would perform a lot quicker without all those objects in it.

    Why would you make that assumption. I would assume, if the datatype is there, the database engine has some efficient means of dealing with it. Like there's some pointer to the actual binary - and the actual data is not traversed, unless it's explicitly called. Because it would probably be insane to have it any other way.


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    Some info here.

    http://www.freeopenbook.com/mysqlcookbook/mysqlckbk-chp-17-sect-7.html

    I guess its because in my limited dealings with databases, (and my limited knowledge of databases) the less data in it. The faster it is. Also writing and reading lots amounts of data is slower than reading and writing small amounts of data. Obviously you could have a badly written queries and such so its not black and white I realise that.

    If you consider, backup. Increment backup of media on a file system is going to be faster than in a database. Because in a file system, less of it will have changed. In a database you'd have to backup the whole thing.

    I guess there is not right answer. You'd have to try it both ways and measure the impact for your content/application. Someone elses application and content might produce different conclusions.


  • Registered Users Posts: 586 ✭✭✭Aswerty


    BostonB wrote: »
    Some info here.

    http://www.freeopenbook.com/mysqlcookbook/mysqlckbk-chp-17-sect-7.html

    I guess its because in my limited dealings with databases, (and my limited knowledge of databases) the less data in it. The faster it is. Also writing and reading lots amounts of data is slower than reading and writing small amounts of data. Obviously you could have a badly written queries and such so its not black and white I realise that.

    If you consider, backup. Increment backup of media on a file system is going to be faster than in a database. Because in a file system, less of it will have changed. In a database you'd have to backup the whole thing.

    I guess there is not right answer. You'd have to try it both ways and measure the impact for your content/application. Someone elses application and content might produce different conclusions.

    Cheers for the link, I don't plan to use MySQL but I do plan to use an ORM so my DB choice isn't particularly important at this point if I'm not using any DB specific functionality.

    What I plan to do is to get a better estimate of what size files I will be dealing with since the estimate I'm currently working off was extrapolated by getting max number of users who may initially use the application and multiplying this by a worst case scenario. Then I'll review storing to the DB and I'll rule it out if it is clearly the worst choice but if I think I'm on the line between DB and file storage I'll probably go ahead with DB storage. My basis for this is that, as krd mentioned, the DB I'll end up using will handle BLOB storage by pointing to it (SQL Server 2005 does this and I'd imagine so does every modern DB) and the files >1MB I expect will not be downloaded much since they will be on the system because a single repository of all documentation is required.

    Anyways this is a good article that looks at typical BLOB storage, Remote BLOB Storage in SQL 2008 and using RBS with SharePoint.

    Also this article about whether or not store BLOBs to the DB although I imagine this is specific to SQL Server 2005 though it is not explicitly stated.


  • Moderators, Society & Culture Moderators Posts: 9,689 Mod ✭✭✭✭stevenmu


    BostonB wrote: »
    I would have assumed the database would perform a lot quicker without all those objects in it. The speed of access to the files would be dependent on connection speed and I/O to the file server, the application speed would be independent of that if it was on another database and/or server. I just imagine if you've a could people uploading/downloading a big file how would that effect other users on the system. Off loading I/O load to other machines and/or processing would seem to be a good way of parallel processing.

    I have no experience of it, and none of the practicality of the limits of the hosting. Just idle musing on my part. Perhaps database and application speed isn't effected at all.
    I think I'm agreeing with you here, I was making the point that sotring the files in the DB won't have good performance, in part because it will be a shared DB server with many other users. I'd also agree that spreading the load (DB on one server, file store on another) would be the better performing solution.


Advertisement