Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Database normalisation question (basic question)

Options
  • 18-12-2008 5:25pm
    #1
    Closed Accounts Posts: 12,382 ✭✭✭✭


    Hello

    I hope someone can help me with this. An old college lecturer drilled database normalisation into my brain, so it's something I think I'm a bit OTT about sometimes.

    For example, let's say I have a users table -

    Users table
    User_ID
    Username
    Password
    First_name
    Last_name
    Email_address
    County
    Gender
    Mailing_list

    I want to be able to define whether the user is "active" or "inactive". The simple solution would be to add another field - status - which has a value of 0 or 1. The normalisation solution (I think) is to create a status table and place it's primary key as a foreign key in users -

    Users table
    User_ID
    Username
    Password
    First_name
    Last_name
    Email_address
    County
    Gender
    Mailing_list
    User_status_ID

    Status table
    User_status_ID
    Text

    The reason for this thinking is there is a many to one relationship between users and status.

    Should I do it the simple way, or the normalisation way?

    Thanks.


Comments

  • Moderators Posts: 51,799 ✭✭✭✭Delirium


    It is essentially a user active flag. You could use the simple way and just add the field without the additional table.
    Some of the tables in projects I've done would have had similiar fields. We would use a checkbox to display the value, making it clear what the value means.

    If you can read this, you're too close!



  • Closed Accounts Posts: 12,382 ✭✭✭✭AARRRGH


    Yeah, I know the simple way will work fine, but the perfectionist part of my brain is saying... no, do it the other way... even if it is probably slower.


  • Moderators Posts: 51,799 ✭✭✭✭Delirium


    AARRRGH wrote: »
    Yeah, I know the simple way will work fine, but the perfectionist part of my brain is saying... no, do it the other way... even if it is probably slower.
    If it is only ever going to be an active/inactive status, I'd go with the simple. The other way is better of course if any additional statuses appear, becuase you'll have all the structures in place already.

    If you can read this, you're too close!



  • Registered Users Posts: 26,579 ✭✭✭✭Creamy Goodness


    also taking Into consideration the size of your user table. How many records will it hold? If its a lot then the second table option is a good bet. If its less than a 100 users I'd just use the status flag column


  • Closed Accounts Posts: 12,382 ✭✭✭✭AARRRGH


    Thanks for the reply.

    I suspect it will be large, certainly more than 20,000 users.

    Maybe I should be a good boy and just normalise everything...


  • Advertisement
  • Closed Accounts Posts: 815 ✭✭✭KStaford


    op

    the cost of a boolean field (status) in the users table is very small. Its only a 1bit flag. These flags are also very efficient in terms of processing, reads, search, indexing etc.

    The alternative, is to have a status table. This would have just 2 entries (or more if you ever come up with more status's). The cost here is in joins, reads etc. It also adds complexity to the db.

    Conceptully, one could argue that the relationship between a user and their status is 1:1 not 1:many. Every single occurence of a status has a single user and very single occurence of a user has a single status. There is a third dimension also in that status is temporal.

    Arguing that user:status is a many:1 relationship is a bit like arguing that because "Galway" appears lots of times in my address_city field, then address_city has a 1:many relationship with users and should be given its own table.

    Does putting the status field in the user table meet the normalisation rules
    1NF - get rid of duplicate attributes or groups of attributes - partial check
    2NF - in 1NF and no partial key dependencies - check
    3NF - in 2NF and no transitive functional dependencies - check.

    My advice - Make the status field a 1 bit (yes/no) type field and put it in the users table EVERY TIME!!!!


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    If you had a user table that had like 20 fields, I might call that user_details, then I might set up a user_ID table with user_UID and Status. However as you've so few fields I wouldn't bother.

    Do you actually have 20k users? or 20k customers?


  • Closed Accounts Posts: 12,382 ✭✭✭✭AARRRGH


    Thanks for the replies everyone.

    I do think it's a many:1 relationship as there will only be three status records - inactive, active and banned, but there will be many user records. A user can only have one status at a time.

    The site will have a lot of users, at least 20,000, but I suspect a lot more. It's sort of an adult site, you see... :)

    I know I could just add the status field to the user and give it a value of 0, 1, or 2, but that just feels wrong.


  • Registered Users Posts: 9,557 ✭✭✭DublinWriter


    You really need to Google the process otherwise known as deriving the third normal form.

    You begin by introducing the new field as a separate table then go through the third normal form process and realise that it should be a field in the users table in the first place.

    I find people instinctively 'get' good data-design or need to go through a formalised routine such as third-normal-form to point out to them the bleedin' obvious!


  • Closed Accounts Posts: 815 ✭✭✭KStaford


    You really need to Google the process otherwise known as deriving the third normal form.

    You begin by introducing the new field as a separate table then go through the third normal form process and realise that it should be a field in the users table in the first place.

    I find people instinctively 'get' good data-design or need to go through a formalised routine such as third-normal-form to point out to them the bleedin' obvious!

    If you're saying that normalisation is just another bloated way for academics to earn their living by writing way too much about it and spending way too much time talking about it, I agree.


  • Advertisement
  • Registered Users Posts: 517 ✭✭✭lisbon_lions


    Well, you are going to have repeating groups, no matter which way you do it, ie loads of status 1's, 2's or whatever values you permit.

    However, the simple way may suffer from insert anomolies where lets say your status (what's the plural of that word?) range from 1-5 and a user at the front end places in a 6? Whereas your normalisation method would not allow for this. Something to consider.


  • Registered Users Posts: 11,980 ✭✭✭✭Giblet


    KStaford wrote: »
    If you're saying that normalisation is just another bloated way for academics to earn their living by writing way too much about it and spending way too much time talking about it, I agree.

    Hey man, if you don't do 20 joins to get a users' details, you aren't doing it right.


  • Closed Accounts Posts: 815 ✭✭✭KStaford


    also op

    Remember that a simple 1bit field (yes/no field) can actually have 3 values - YES, No and NULL so it will work in your case. This is one of th emain reasons why I recommend it.


Advertisement