Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Help designing WebSite

Options
  • 04-04-2007 11:36am
    #1
    Registered Users Posts: 500 ✭✭✭


    I am at the very early stages of thinking about an idea for a website. Usually im a JSP, Servelt guy - doing some ASP.NET at the moment. Relational DB's.

    The site i am developing handles massive amount of data at the moment. 100,000+ records of people. lots of into about the people. and maybe an image for each person. Records stored in MS Access at the moment

    The site iteslf will be very simple. just show the list - contain a search based on fileds(name, title, etc) and allow you to click on a name to get more info about the person.

    Based on the large volumes of data - I was wondering what would be the best way. I dont think a relational DB would be the fastest solution. and speed is an issue.
    Needs to be able to handle a high volume of users.

    Any tips would be appreciated.


Comments

  • Registered Users Posts: 1,127 ✭✭✭smcelhinney


    How often would the information be updated?

    The only non relational db solution I can think of is XML/XSLT, but thats slightly inefficient in itself.


  • Registered Users Posts: 500 ✭✭✭warrenaldo


    its could all be static data - its never updated! the problem is transfering that amount of data - or where to i put it?


  • Registered Users Posts: 139 ✭✭Higgsy


    Hi warrenaldo

    Why do you think a relational database would not be fast enough, it is fast enough for google and yahoo, so I dont see a reason why it would not be fast up for you.

    Look at Postgres or MySQL (both open source)

    Also maybe look at in memory databases like Derby.

    Develop the solution, test it and then determine if you have a performance issue. If you write the data access layer correctly the change should be minimal

    Higgs


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    warrenaldo wrote:
    I dont think a relational DB would be the fastest solution.
    Why not?

    I ask because I find it hard to understand how you can be asking for ideas for a faster way but also suggesting that you have a reason to believe one exists.
    the problem is transfering that amount of data - or where to i put it?
    Transferring what amount of data?

    Surely you're planning that the search runs server-side, and only the results get sent to the client.

    How is there a transfer problem with relation to quantity in that?


  • Registered Users Posts: 500 ✭✭✭warrenaldo


    for everything i have ever done ive used a relational DB - so i have no real problem with them. But i just "dont know" how well they scale to this sort of thing.
    Its the only real way i know how to do this stuff. So its probably the way i would have went on my own.
    But just wondering what other options are really out there for me.

    Massive amounts of static data - needed to be accessed very quickly - im sure there is alternatives.


  • Advertisement
  • Registered Users Posts: 2,781 ✭✭✭amen


    ahh I think I see what you are getting at. The individual user pages are static and don't change?

    In that case I would have a table that gives me the name of the page/index/ref etc and present that to the user
    and the redirect to that page

    if that pages are fairily static and follow a template you could store the info in a db and perodically regenerate the pages and store on your web site

    btw a properly designed relational db can handle more data than you will ever handle/create on your own. If you are getting to the stage where the db is a problem you will prob have a whole team working for you unless you have a design issue


  • Registered Users Posts: 995 ✭✭✭cousin_borat


    IBM DB2 can now natively store XML. This implicates that you can make native SQL and/or XML queries.

    If you're storing large amounts of data and potentially images then that may be a consideration if you are looking at commercial databases.

    It's used extensively in Health care now where patient data is stored in Relational Format and then medical reports and data such as X-Rays is stored as XML.


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    warrenaldo wrote:
    for everything i have ever done ive used a relational DB - so i have no real problem with them. But i just "dont know" how well they scale to this sort of thing.
    The scalability of an RDBMS is almost unlimited. In fact, the scalability of your search function will generate problems before a database would.
    Massive amounts of static data - needed to be accessed very quickly
    You keep repeating this mantra.

    First off - I wonder if what you really mean is "massive amounts of static data, some small subset of which needs to be identified and retrieved very quickly" because I think thats more accurate to what you're descibing.

    Secondly, you have yet to define what "very quickly" means. Nanoseconds? Microseconds? Sub-second? Under 3 seconds?

    Thirdly, your description says you have hundreds of thousands of records, but want to able to just "click on the name of a user". Does this mean you effectively want to present a list of hundreds of thousands of links which the user can scroll through and click on anything in it? If so, then your problem of "very quickly" is going to be very quickly overshadowed by the length of tiem it takes to send that list to the client in the first place.

    Finally, you say the data is static. Do you mean it will never change, or that it changes rarely (like maybe is reloaded once a month). If the latter, is the speed/availability of the system a question while the data is being refreshed?
    im sure there is alternatives.
    There are always alternatives. There may not be better alternatives. There may not be better alternatives which you can afford.

    But the first thing is to find exactly what it is you want alternatives to. As I mentioned above, your problem is only vaguely defined*

    You could, for example, load the list of people piece-meal using asynchronous methods (e.g. AJAX). Depending on the number of concurrent users of the system and the resources you have at hand, you could pre-fetch the data and build the pages for the asynchronously-loaded info, in case the user does click on a link on whatever set of data you've just sent them. With enough resources, you could load all 100,000+ records into memory - despite what you think, its not that massive (Oracle, for example, will often talk about small-to-medium tables and give an example of a table with 3-5 million rows).

    But while you talk in vague terms such as "very fast", and don't describe the functionality, its gonna be tough to give you more than a vague answer that may be totally inapplicable to the real problem you're trying to solve.

    *I don't mean to sound harsh on this. I've spent the last two weeks fighting with the people who are supposedly specifying functionality for a system I'm on the dev-team of. They will give us a vague description of what they want, but when you start asking the detailed questions look shocked as though its not their job to tell us what it is they want. They just want to tell us things are wrong...not what is right. If you can't clearly state what you want, you will be lucky if you ever get it.


  • Registered Users Posts: 500 ✭✭✭warrenaldo


    the system would be very similar to a census system. It will show a very large list of records. Which may/may not contain an image.
    The data would NEVER change.

    Now it wont show a direct list of all records. But the records will be subdivided by Location(for instance) and further by location - ie Dublin, Tallaght
    then get a list of people in dublin, tallaght.
    So 'some small subset of which needs to be identified and retrieved very quickly' is about right.

    When i say very fast - i mean fast enough that there is no time lag when accessing the site - i want it to run seamlessly.
    Was just wondering whether accessing this amount of data and transferring it would slow me down. which may be a possibility from yur comment.

    The way i would do it is RDBMS - From sounds of posts so far it seems the way to do it. But out of interest i was wondering about other possibilitys.

    The possibilty of presenting a list of 100 thousand records is a possibility. If this was to slow down the system then could you offer any possible solutions to my dilema?
    Split it up into smaller subsets maybe - Dublin, Tallaght, Somewhere - is not really an option.

    This is the reason im looking for help. Thanks for comments so far.


  • Closed Accounts Posts: 25,848 ✭✭✭✭Zombrex


    warrenaldo wrote:
    the system would be very similar to a census system. It will show a very large list of records. Which may/may not contain an image.
    The data would NEVER change.

    Now it wont show a direct list of all records. But the records will be subdivided by Location(for instance) and further by location - ie Dublin, Tallaght
    then get a list of people in dublin, tallaght.
    So 'some small subset of which needs to be identified and retrieved very quickly' is about right.

    When i say very fast - i mean fast enough that there is no time lag when accessing the site - i want it to run seamlessly.
    Was just wondering whether accessing this amount of data and transferring it would slow me down. which may be a possibility from yur comment.

    The way i would do it is RDBMS - From sounds of posts so far it seems the way to do it. But out of interest i was wondering about other possibilitys.

    The possibilty of presenting a list of 100 thousand records is a possibility. If this was to slow down the system then could you offer any possible solutions to my dilema?
    Split it up into smaller subsets maybe - Dublin, Tallaght, Somewhere - is not really an option.

    This is the reason im looking for help. Thanks for comments so far.

    A relational database is the way to go. MS Access though isn't. You want a proper database system, such as MS SQL Server, PostgreSQL or MySQL.

    I work on a database that stores approx 6 million entries spread over a number of tables, with about 10,000 added each hour (its weather data from around the world). We use PostgreSQL running on a double processer server with 2GB of memory and a RAID setup. It can retrieve most of the complex SQL queries we run on it in under a second.

    To make a database go from taking an hour to do something to taking a second it is all to do with how you design the database. If something is taking ages to run it is far more likely your database isn't designed properly than something else like your server being slow.

    If you are dead set against using an RDBMS the last place I worked used a flat file system for storying records in thier commersial program they sold. They did this because they didn't want to ship a RDBMS with their software. They also claimed that it was quicker, but I think that was none of them knew how to set up a RDBMS properly. They were happy with this but every time they wanted to change something it took 3 guys the better half of a week to make and test any little change.

    Using a flat file setup is very complex and is requires a lot of work. If you have any kind of relations between your data (ie more than one table) it is a nightmare. Which is why the RDBMS was invented in the first place.


  • Advertisement
  • Registered Users Posts: 2,781 ✭✭✭amen


    We use PostgreSQL running on a double processer server with 2GB of memory and a RAID setup. It can retrieve most of the complex SQL queries we run on it in under a second
    which you can do with any properly designed and tuned database but if you query is bringing back 10,000 records and you want to display this data on the client this is still going to take a lot of time

    as Bonkey suggested why not some sort of Ajax maybe start of with a listbox of say a country then when you select the county populate a county list box then a city etc

    what exactly is the app used for?
    what languages are you going to use?
    if they data is realtively static can your have job run at night to produce standard result sets?


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    I think your main limitation will be the hardware you can afford to run it tbh.


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    BostonB wrote:
    I think your main limitation will be the hardware you can afford to run it tbh.
    Pretty much.

    With enough money spent, a flat-file system with 6 million records may run faster than the same system with an RDBMS with 400 records. But you'd want to be spending huge amounts on the former (or using 1980s hardware on the latter).

    RDBMS's have a reputation for complexity. Extreme normalisation of a system can actually cause the generation of slower queries, and particularly where purely-static data (i.e. data that will never change, ever) is concerned, it can have time costs.

    Although the temptation is to design a database that suits all needs, you have to build the database that suits its activity - Wicknight gives a good example of what he works on. It's heavy on changes, but not an exceptionally massive database. So that database should be optimised/designed primarily for update efficiency, perhaps with a replicated warehouse database for reporting purposes. But I only say that based on his two lines. Designing databases for small-scale use is easy. Designing a database architecture for large-scale use is a subproject all of its own.

    Any web developer can design and build a perfectly good database for use on his website. But if you're looking for enterprise efficiency and quality, you want a dedicated DB developer to work with you.


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    We've a few big databases, few million records, but all text data. We've a lot of heavy duty hardware behind it, distributed db and application servers and multiple disk clusters. But with a lot of activity, around 1000 active sessions at any one time. Our main bottleneck is often users net connections. While we do a lot of our own db optimization. We regularly get specialists in who to tweak it still further.

    I'm not a DBA but The DB the op is taking about seems a massive project. On a hunch and maybe I'm wrong, I doubt theres the budget to do it.


  • Closed Accounts Posts: 25,848 ✭✭✭✭Zombrex


    amen wrote:
    which you can do with any properly designed and tuned database but if you query is bringing back 10,000 records and you want to display this data on the client this is still going to take a lot of time

    Well I wouldn't recommend anyone try and display 10,000 records to the user at one time, not just from a speed point of view but also from a HCI point of view. If you are displaying that much information your queries are doing something wrong


  • Registered Users Posts: 500 ✭✭✭warrenaldo


    at the moment the project is not even started - just looking into it at present. want to get some ideas on how to do this. Usually i would design a website using a simple DB - SQL Server. and use JSP, Java Beans, Servlets. Starting to use ASP.NET these days - but the websites are nothing too complex - they do not have extensive use and dont need to support massive numbers of users. This does.
    I will NOT be doing the project myself - dont have the experience - it will be outsourced. But i may be asked for a prototype(no speed or performance requirements) and i would like to have a few varied opinions on a variety of ways about doing this.

    From posts so far - i feel a RDBMS is a good way to go - but a pro DB developer would be needed to design and optimize the data retrievel - probably using things like indexes.
    The other way to go would be flat files - which may be faster for speed.

    Thanks for the help so far.


  • Users Awaiting Email Confirmation Posts: 351 ✭✭ron_darrell


    I'm a bit confused by your question and the fact that even though you have heard from many sources that RDBS is the way to go you still sound unsure. Are you hearing from someone in your organisation that there is a more efficient way?

    I'm also assuming that ASP.NET is being enforced as the design mechanism either internally in the organisation or as requested by the client. My understanding of ASP.NET is that it already implements many techniques form optimising data retrieval and data display (paged data views etc.) Also an SQL Server edition has been developed by MS to work in tandem with it's ASP.NET software and would be the recommended DB environment for the type of project you are working on.

    To be honest for only 10,000 records do you really need to go to .NET at all? While there are many useful tools in the .NET environment for a project as small as this a well designed ASP site with SSL for security would probably be more than sufficient.

    If you need any help PM me at your convenience.

    -RD


  • Registered Users Posts: 11,980 ✭✭✭✭Giblet


    I'd look into using collections and pagination aswell if you want to show a lot of results.


  • Registered Users Posts: 2,931 ✭✭✭Ginger


    One of the projects I was working on was a midsized system of 5million+ records. Most of the data was static once it was put in. Search times for any given search had to be less than 1 second and its currently .5 seconds with display time of less than 2. Its updated with about 1000 records per day.

    This was using a system called Witango similar to ASP/PHP in style etc.

    When the DB was first designed, it was over normalised for the system and queries took 90 seconds. I redesigned the DB and now its back within acceptable margins. Its running on dual proc box with 4gb of RAM in a raid setup.

    Basically a well designed db with the correct optimisation strategies will give you the results back nice and quick.

    The main bottle neck on the system is speed of internet connection, it can seem very slow on a 56K modem as in 10 seconds due to images.

    So this depends also on where it will be deployed, intra or inter net


  • Closed Accounts Posts: 4,943 ✭✭✭Mutant_Fruit


    To be honest for only 10,000 records do you really need to go to .NET at all? While there are many useful tools in the .NET environment for a project as small as this a well designed ASP site with SSL for security would probably be more than sufficient.
    Erm, how does the amount of records stored in a database affect the language/framework chosen to develop in?

    There's absolutely no reason to suggest ASP over ASP.NET. If he said he was integrating with an existing solution that was entirely ASP based, i might be inclined to agree that ASP would be better because you could integrate easier and would involve the least server-side changes. But since that wasn't said, i don't really see why ASP is a better choice because the *database* has a few hundred thousand records.

    And yes, a database will more than likely be a *lot* faster than a flatfile setup and certainly easier to maintain.


  • Advertisement
  • Users Awaiting Email Confirmation Posts: 351 ✭✭ron_darrell


    Very simply the reason would be a big project would need more of the features that are native to .NET. The number of records indicates that this is not a heavy duty system being developed so why use a dynamite when a hammer will do ? Too many people try and develop sistine chapels when a village church is all that's needed. Ever heard of KISS ??

    -RD


  • Closed Accounts Posts: 4,943 ✭✭✭Mutant_Fruit


    Very simply the reason would be a big project would need more of the features that are native to .NET.
    Thats a non-argument. The fact that .NET has a lot of built in classes does not mean that any solution using ASP.NET is bloated or "dynamite". ASP.NET would be a perfect choice for a small project if you are planning on learning how it all works. I think ASP has far too many features and is unsuitable for web dev :rolleyes:

    KISS doesn't really apply to the language chosen to develop in. I don't think that "keep it simple, stupid" can be interpreted as "use ASP, not ASP.NET".

    It's like you're advocating using VB 5 over VB 6 because VB 6 is too complicated :rolleyes:


  • Registered Users Posts: 500 ✭✭✭warrenaldo


    i am looking at a Java API for text searching called lucene - it was recommeneded to me elsewhere. it looks farly intresting.
    Basically a Java API. It seems to just create a massive index of files in a dir structure and gives you the ability to search them.
    Anyone know anything about it.

    Im not discounting RDBMS at all - just gathering as much info as possible.


  • Closed Accounts Posts: 4,943 ✭✭✭Mutant_Fruit


    If you want full text search, then lucene is your man. Otherwise not really. (from what i know).


  • Users Awaiting Email Confirmation Posts: 351 ✭✭ron_darrell


    Thats a non-argument. The fact that .NET has a lot of built in classes does not mean that any solution using ASP.NET is bloated or "dynamite". ASP.NET would be a perfect choice for a small project if you are planning on learning how it all works. I think ASP has far too many features and is unsuitable for web dev :rolleyes:

    Oh where to start.
    1. You can use any language, no matter how complicated or inappropriate and create a small project with it in order to learn. That doesn't mean it is the most appropriate language to write the project in.
    2. ASP.NET has a much bigger learning curve and many more features than it's little brother ASP.
    3. It also requires a development environment (and while you can download a free express version from MS it is far from a full development solution).
    4. Also you will experience a shorter turnaround writing in ASP, and so can develop a prototype more quickly and easily.
    5. Finally you can upgrade code from ASP to .NET reasonably quickly and that path is also a good way to quickly get a feel for .NET.
    KISS doesn't really apply to the language chosen to develop in. I don't think that "keep it simple, stupid" can be interpreted as "use ASP, not ASP.NET".

    KISS applies to everything. Too many IT professionals have jumped on the .NET bandwagon simply because it is there instead of looking on case by case basis and seeing if it is appropriate. I'm not dissing .NET. It has amazing potential. But by no means should it be seen as the only way to do things.
    It's like you're advocating using VB 5 over VB 6 because VB 6 is too complicated :rolleyes:

    That is just the most ridiculous argument I have ever heard. You are not comparing like with like.

    -RD


  • Closed Accounts Posts: 4,943 ✭✭✭Mutant_Fruit


    1. You can use any language, no matter how complicated or inappropriate and create a small project with it in order to learn. That doesn't mean it is the most appropriate language to write the project in.
    I agree 100%. But that doesn't mean you shouldn't use language X because it is "inappropriate". The fact that the project is small means its an ideal candidate for experimentation if you want to try a new language.
    2. ASP.NET has a much bigger learning curve and many more features than it's little brother ASP.
    Just because it has more features does not necessarily mean there's a bigger learning curve. I've heard that argument used in reverse. ASP.NET has more features which makes it easier to use and learn than ASP.
    3. It also requires a development environment (and while you can download a free express version from MS it is far from a full development solution).
    I'd agree with that. It is much easier to use ASP.NET from within Visual Studio (or some other similar IDE) than in a standard text editor.
    4. Also you will experience a shorter turnaround writing in ASP, and so can develop a prototype more quickly and easily.
    Those are the exact arguments that are used to encourage developers to switch from ASP to ASP.NET. I'd also go as far to say that not only can you prototype faster in ASP.NET, but you can also get a finished product faster.
    5. Finally you can upgrade code from ASP to .NET reasonably quickly and that path is also a good way to quickly get a feel for .NET.
    In this case it would be better to go straight to ASP.NET rather than writing in ASP then converting to ASP.NET, wouldn't you agree?

    Out of interest, does ASP have a datagrid control which supports directly binding data from an SQL query into a table suitable for a human to view? I honestly don't know, but ASP.NET does.

    It's as simple as:
    grid.DataSource = myDataTableFromMyQuery;
    grid.DataBind();

    Maybe comparing VB5 to VB6 was a bit odd, but comparing VB6 to VB7 would be the same as comparing ASP to ASP.NET. I don't think you'd recommend using VB6 over VB7, would you?


Advertisement