Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

How to select specific columns from a .txt

Options
  • 17-02-2010 5:38pm
    #1
    Registered Users Posts: 8,452 ✭✭✭


    Hi folks,

    This is technically a question, but it's so noobish I've given it that prefix.

    Oftentimes I am faced with a Wall of Text when really I only want certain elements of it.

    For example, I might get output like:
    . reg wage school male exper if school>12
    
          Source |       SS       df       MS              Number of obs =     753
    -------------+------------------------------           F(  3,   749) =   24.81
           Model |  915.601886     3  305.200629           Prob > F      =  0.0000
        Residual |  9213.93113   749  12.3016437           R-squared     =  0.0904
    -------------+------------------------------           Adj R-squared =  0.0867
           Total |   10129.533   752  13.4701237           Root MSE      =  3.5074
    
    ------------------------------------------------------------------------------
            wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          school |   .6847627   .1578205     4.34   0.000     .3749396    .9945859
            male |   1.684875   .2566332     6.57   0.000     1.181069    2.188681
           exper |   .2706994   .0757407     3.57   0.000     .1220101    .4193887
           _cons |  -4.981732   2.296145    -2.17   0.030    -9.489377   -.4740875
    ------------------------------------------------------------------------------
    
    . reg wage school male exper if school<=12
    
          Source |       SS       df       MS              Number of obs =    2543
    -------------+------------------------------           F(  3,  2539) =   40.60
           Model |  1921.04819     3  640.349397           Prob > F      =  0.0000
        Residual |  40042.0625  2539  15.7708005           R-squared     =  0.0458
    -------------+------------------------------           Adj R-squared =  0.0447
           Total |  41963.1107  2542  16.5079114           Root MSE      =  3.9712
    
    ------------------------------------------------------------------------------
            wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          school |   .4648227   .0634931     7.32   0.000     .3403191    .5893263
            male |   1.403662   .1605335     8.74   0.000     1.088872    1.718452
           exper |   .0587847   .0332937     1.77   0.078     -.006501    .1240703
           _cons |   -.970499   .8024911    -1.21   0.227    -2.544103    .6031047
    ------------------------------------------------------------------------------
    
    which is fine, but not much use to me for presentation purposes - I might just want to produce the t-statistics. When you only have 8 instances, it's obviously not much bother to hand input them to whatever Excel table I want etc. I could even use a decent text editor to select those columns.

    However I would like the ability to write a script in something like PHP that could extract the columns as arrays. I would like this as a general skill rather than just writing a script to extract the info from the above output.

    For the automated processes that I write I've always gotten by with simple PHP or even JavaScript but that's the extent of my programming knowledge.

    Here's what I'd like to know:
    1. What's a good language for gathering what I want from text files, assuming it's in some standard ASCII format?
    2. Failing that, is there some way of getting somewhat messy text like that into Excel columns pretty quickly?
    Thanks in advance.


Comments

  • Registered Users Posts: 4,277 ✭✭✭km991148


    Well whatever lang you use its regular expressions you want to start working with;
    About the above tho? Where are you getting that output from? If you can get it to csv format or something more suitable to automatic parsing, it would make life a whole lot easier..


  • Registered Users Posts: 1,916 ✭✭✭ronivek


    Perl is about the most powerful text processing (not limited to just text processing either) language out there.

    Regular expressions are a way for extracting and matching text; but are a bit of an arcane art in some respects. They're implemented in pretty much any programming language you can think of in various forms.

    There are also various Unix tools and so on depending on your platform of choice and/or requirements.

    If you want to put in some time and effort to learn though I'd recommend regular expressions; as mentioned above they're to a large extent fairly implementation agnostic.


  • Registered Users Posts: 8,452 ✭✭✭Time Magazine


    km991148 wrote: »
    Well whatever lang you use its regular expressions you want to start working with;
    About the above tho? Where are you getting that output from? If you can get it to csv format or something more suitable to automatic parsing, it would make life a whole lot easier..
    It's from statistics software. That's bog standard output and hacks exist to convert it to Excel/LaTeX etc, but such hacks don't work for the more obscure commands. I'd also like to be able to do it no matter what the output, statistical or otherwise.

    RE's and Perl seem to be the way to go.

    Thanks for the advice, ronivek.


  • Registered Users Posts: 2,781 ✭✭✭amen


    windows log parser might help.

    you can write sql like queries against the data


  • Registered Users Posts: 3,721 ✭✭✭E39MSport


    perl would be the RR to some but as ronevik says theres plenty you can do if you have access to a *nix platform or even an emulator. Using sed, awk etc and you can get great results especially in conjunction with a simple shell script that reads the file line by line while outputting your required data.


  • Advertisement
  • Registered Users Posts: 1,109 ✭✭✭Skrynesaver


    I reckon Perl is the easiest way to go.
    is this a standard format (ie. output of a common program), if so search for the program name on CPAN, there may already be a module for processing the output.


  • Registered Users Posts: 4,081 ✭✭✭sheesh


    if the large spaces above are tabs you might be able to paste the while thing into word and convert it to a table quiet easily telling word to create cells from tabs (thre was an option to do that

    alternatively look at a text editor like textpad it allows you to block select text so you could select some of the blocks manually.

    that output looks like it could come from a database output is it possible for you to do queries yourself? if you knew what you needed it might be worth your while getting some queries done.

    if it is a database again it could be output as xml this would give each of the bits of information context and allow you to easily program selections for yourself.

    But as other people said perl would be the scripting language of choice for this kind of operation


Advertisement