Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

MS Excel filter help

Options
  • 22-08-2009 8:58pm
    #1
    Closed Accounts Posts: 19,080 ✭✭✭✭


    In Excel I have a big list of data. There are 4 columns.

    Some of these rows will be duplicated. I want to remove all the rows which are duplicates and also the original rows leaving me with only the rows which are unique.

    i.e.
    ROW 1: a b c d
    ROW 2: a b j k
    ROW 3: a b c d
    ROW 4: a j k o
    ROW 5: a b j k

    I want to filter so that I'm only left with "a j k o", in this case ROW 4.
    ROW 4: a j k o


    Can anyone help out with the formula? I have tried advanced filter to show "unique records only" but this returns which is not what I want.
    ROW 1: a b c d
    ROW 2: a b j k
    ROW 4: a j k o

    Can anyone help out? Thanks


Comments

  • Registered Users Posts: 78,432 ✭✭✭✭Victor


    How about something like this?

    Normally when I'm doing tests like this I have a single long list, for example I'm looking for all the unique words in a doument. So I'm not sure if this suits your needs.


  • Closed Accounts Posts: 4,013 ✭✭✭kincsem


    This isn't elegant. I'm assuming the data is in columns A, B, C, D.
    • In column E put a formula =A2&B2&C2&D2 and copy it down to the end.
    • In column F put the formula =COUNTIF($F2$F9,E2) and copy it down (adjust the formula to all of the data in column F).
      This will give numbers e.g 1, 2, 3.
    • Then Data/Filter/Autofilter for all the data on the spreadsheet.
    • Click on the filter tab at the top of culumn F, select custom, "is greater than", and type in 1.
      This will show all rows with more than one occurance.
    • Then Edit/Delete will delete those rows. Then Data/Filter/Autofilter will remove the filter and leave the rows with one occurance.


  • Closed Accounts Posts: 19,080 ✭✭✭✭Random


    @Victor
    Not sure I can really work ths out... can you explain a little more?

    @kincsem
    The formula you want me to put into Column F isn't working. Getting the standard Excel forumla error. I tried it as you have it and I also tried =COUNTIF($F2$F3000,E2) (because I have 3000 rows of data). I'm not really sure what it should be doing though, could you explain?

    Thanks


  • Closed Accounts Posts: 2,696 ✭✭✭mark renton


    have you tried using access - the basic sql queries for this would be very simple


  • Registered Users Posts: 59,623 ✭✭✭✭namenotavailablE


    The formula you want me to put into Column F isn't working. Getting the standard Excel forumla error.

    You should have a colon in the formula as follows:
    [B]=COUNTIF($F2[COLOR=Red]:[/COLOR]$F3000,E2)[/B]
    

    This formula will counbt the number of times that the data in cell E2 is repeated in the range F2 to F3000. If you copy it down it will identify all duplicates in the E column.


  • Advertisement
  • Closed Accounts Posts: 19,080 ✭✭✭✭Random


    I have MS Access 2003. Can I use that to manipulate the data I want and then put it back in Excell again?


  • Closed Accounts Posts: 2,696 ✭✭✭mark renton


    yes - just open the excel file from access - save table as table_name, then in sql

    select distinct * from table_name;


  • Closed Accounts Posts: 19,080 ✭✭✭✭Random


    I want to select all distinct rows, so shouldn't I have

    SELECT distinct col1,col2,col3,col4 from sheet1;

    ???


  • Closed Accounts Posts: 2,696 ✭✭✭mark renton


    Random wrote: »
    I want to select all distinct rows, so shouldn't I have

    SELECT distinct col1,col2,col3,col4 from sheet1;

    ???

    same thing: * is all columns


  • Closed Accounts Posts: 19,080 ✭✭✭✭Random


    So going back to my original example:

    Assuming I have this
    ROW 1: a b c d
    ROW 2: a b j k
    ROW 3: a b c d
    ROW 4: a j k o
    ROW 5: a b j k

    Will I be left with this
    ROW 4: a j k o

    or this:
    ROW 1: a b c d
    ROW 2: a b j k
    ROW 4: a j k o


  • Advertisement
  • Closed Accounts Posts: 2,696 ✭✭✭mark renton


    Random wrote: »
    So going back to my original example:

    Assuming I have this
    ROW 1: a b c d
    ROW 2: a b j k
    ROW 3: a b c d
    ROW 4: a j k o
    ROW 5: a b j k

    you will be left with this - duplicated rows not returned

    ROW 1: a b c d
    ROW 2: a b j k
    ROW 3: a j k o


  • Closed Accounts Posts: 19,080 ✭✭✭✭Random


    That's not what I want though. I only want to see the unique rows. If the row exists more than once I don't want it returned at all. I only want to see the unique rows.


  • Closed Accounts Posts: 2,696 ✭✭✭mark renton


    my bad

    try this

    select f1,f2,f3,f4
    from sheet1
    group by f1,f2,f3,f4
    having count(f1 and f2 and f3 and f4) < 2;


  • Closed Accounts Posts: 19,080 ✭✭✭✭Random


    That last one appears to have done what i want except there's one further thing i neglected to mention. . .

    Some rows are coloured red and some are coloured black in excel. This formatting appears to be lost in access. Is it possible to keep this in access at all?

    Thanks


  • Closed Accounts Posts: 2,696 ✭✭✭mark renton


    Sorry dude - my lack of Excel knowledge is why I use Access


  • Closed Accounts Posts: 5,096 ✭✭✭--amadeus--


    Best bet would be to write a macro. I'm hungover and skiving so if you can't crack it I'll have a look later but sort teh data and get the code to compare the row it's in with the row below. If they match then select and compare the next row down. Repeat until you find a non matching row, then delete all selected rows.


  • Closed Accounts Posts: 2,696 ✭✭✭mark renton


    :rolleyes:

    hungover?

    still drunk morelike


  • Closed Accounts Posts: 2,696 ✭✭✭mark renton


    Random wrote: »
    That last one appears to have done what i want except there's one further thing i neglected to mention. . .

    Some rows are coloured red and some are coloured black in excel. This formatting appears to be lost in access. Is it possible to keep this in access at all?

    Thanks

    Actually you could create a new column, and flag all black rows as 1 and all red as 2 - then when you want to see these in sql query just include
    "and column = 1" or to exclude "and column <> 1"


  • Closed Accounts Posts: 5,096 ✭✭✭--amadeus--


    john47832 wrote: »
    :rolleyes:

    hungover?

    still drunk morelike

    Wow, constructive post

    I have a snippet of code in XL that deletes all duplicates and it works in exactly that way - compares the value to teh value below and if it finds a match it deletes the row. Dirty? Yes. Quick and effective? Yes. Adaptabel to suit the OP, absolutley.

    The OP wanted an XL solution to his problem, I suggested one. No need for :rolleyes:


  • Closed Accounts Posts: 2,696 ✭✭✭mark renton


    :rolleyes:


  • Advertisement
  • Closed Accounts Posts: 5,096 ✭✭✭--amadeus--


    john47832 wrote: »
    Actually you could create a new column, and flag all black rows as 1 and all red as 2 - then when you want to see these in sql query just include
    "and column = 1" or to exclude "and column <> 1"

    :rolleyes:

    (see, I can do that too)

    OP has 3000 rows of data, do you want him to manually amend each one? It'd be as fast to manually delete the duplicates
    Sub del_dupes()
    
        Range("A2").Select
            varrange = ActiveCell.CurrentRegion.Address
                Range(varrange).Sort Key1:=Range("A1"), Order1:=xlAscending
    
        Do Until ActiveCell.Value = ""
        
            If ActiveCell.Value = ActiveCell.Offset(1, 0).Value Or ActiveCell.Value = ActiveCell.Offset(-1, 0).Value Then
            
    [I][B]            ActiveCell.Offset(0, 1).Value = "d"[/B][/I]
                
            End If
            
            ActiveCell.Offset(1, 0).Select
            
        Loop
        
    [I][B]    Range("b2").Select[/B][/I]
        
           Do Until ActiveCell.Offset(0, -1).Value = ""
                
                If ActiveCell.Value = "d" Then
                    
                    varaddy = ActiveCell.Address
                    ActiveCell.EntireRow.Delete
                    Range(varaddy).Select
                
                Else
                
                    ActiveCell.Offset(1, 0).Select
                    
                End If
                
            Loop
                    
    
    End Sub
    

    OP - this is a snippet of code that will do exactly what you require in XL and will maintain teh formatting.

    I have assumed that the data starts in Row 2. The code will sort teh data and if it finds a match will mark that row for deletion with a "d". To alter where this flag goes change the offset line (1st bold & italiced line)

    EG I have it going 1 col across (into Col B), to make it 2 cols acrossd (Col C) change teh code from:

    ActiveCell.Offset(0, 1).Value = "d"

    to

    ActiveCell.Offset(0, 2).Value = "d"

    You will also need to change teh address it checks in - I am checking Col B but if you move it x num of cols over you will need to update that as well (2nd bold & italic line)

    I'm assuming you know how to get into the VBA editor, if not let me know and I'll post instructions.

    [edit - yes there are cleaner and faster ways to code this but it's throwaway code not an academic excercise]


  • Closed Accounts Posts: 2,696 ✭✭✭mark renton


    Now cmon dude - I was :rolleyes: at the hangover n skiving part

    if this is going to turn into a competition between Excel and Access you will not win - compare what you had to do against an sql query


  • Closed Accounts Posts: 5,096 ✭✭✭--amadeus--


    john47832 wrote: »
    Now cmon dude - I was :rolleyes: at the hangover n skiving part

    if this is going to turn into a competition between Excel and Access you will not win - compare what you had to do against an sql query

    Ahh fair enough - I thought you were taking a pop at my suggesting a macro.

    Of course if the OP was paying me I could have written the SQL directly in VBA and had teh best of both worlds :D


  • Closed Accounts Posts: 19,080 ✭✭✭✭Random


    Alt+F11 puts me into the VB script thing, but not really sure where to go from there. I'm gonna have a look at the earlier Excel thing again and see if it can help me as it may work better than SQL in this case.


  • Closed Accounts Posts: 19,080 ✭✭✭✭Random


    kincsem wrote: »
    This isn't elegant. I'm assuming the data is in columns A, B, C, D.
    • In column E put a formula =A2&B2&C2&D2 and copy it down to the end.
    • In column F put the formula =COUNTIF($F2$F9,E2) and copy it down (adjust the formula to all of the data in column F).
      This will give numbers e.g 1, 2, 3.
    • Then Data/Filter/Autofilter for all the data on the spreadsheet.
    • Click on the filter tab at the top of culumn F, select custom, "is greater than", and type in 1.
      This will show all rows with more than one occurance.
    • Then Edit/Delete will delete those rows. Then Data/Filter/Autofilter will remove the filter and leave the rows with one occurance.
    You should have a colon in the formula as follows:
    [B]=COUNTIF($F2[COLOR=Red]:[/COLOR]$F3000,E2)[/B]
    

    This formula will counbt the number of times that the data in cell E2 is repeated in the range F2 to F3000. If you copy it down it will identify all duplicates in the E column.


    I've adjusted the formula in E to read "=A2&B2&C2&D2" the whole way down.
    I've adjusted F to read "=COUNTIF($F2:$F5977,E2)" the whole way down (there's actually 5977 rows).
    All of the F rows are "0"?

    Should the formula in F not read $E$2 and $E$5977? (as we're comparing the E column and we want to keep the range static also?


  • Closed Accounts Posts: 19,080 ✭✭✭✭Random


    kincsem wins the prize* in the end. With a few modifications (see my above post) I went with his solution and it appears to have worked out.

    Thanks everyone else for the suggestions.... turns out it was all pretty simple in the end.

    Thanks again.







    *There's no prize .. sorry :D


  • Closed Accounts Posts: 4,013 ✭✭✭kincsem


    Random wrote: »
    @Victor
    Not sure I can really work ths out... can you explain a little more?

    @kincsem
    The formula you want me to put into Column F isn't working. Getting the standard Excel forumla error. I tried it as you have it and I also tried =COUNTIF($F2$F3000,E2) (because I have 3000 rows of data). I'm not really sure what it should be doing though, could you explain?

    Thanks

    Sorry,

    I forget to put the colon : in the middle of the formula. This was kindly pointed out quickly by another poster. Apologies.

    At me ---> kincsem, always check your work. :o


    I just logged in and was surprised to see 26 posts.


  • Closed Accounts Posts: 4,013 ✭✭✭kincsem


    Prize, prize, what prize? :)


Advertisement