How to select specific columns from a .txt

Time Magazine · 17-02-2010 5:38pm #1

Hi folks,

This is technically a question, but it's so noobish I've given it that prefix.

Oftentimes I am faced with a Wall of Text when really I only want certain elements of it.

For example, I might get output like:

. reg wage school male exper if school>12

      Source |       SS       df       MS              Number of obs =     753
-------------+------------------------------           F(  3,   749) =   24.81
       Model |  915.601886     3  305.200629           Prob > F      =  0.0000
    Residual |  9213.93113   749  12.3016437           R-squared     =  0.0904
-------------+------------------------------           Adj R-squared =  0.0867
       Total |   10129.533   752  13.4701237           Root MSE      =  3.5074

------------------------------------------------------------------------------
        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      school |   .6847627   .1578205     4.34   0.000     .3749396    .9945859
        male |   1.684875   .2566332     6.57   0.000     1.181069    2.188681
       exper |   .2706994   .0757407     3.57   0.000     .1220101    .4193887
       _cons |  -4.981732   2.296145    -2.17   0.030    -9.489377   -.4740875
------------------------------------------------------------------------------

. reg wage school male exper if school<=12

      Source |       SS       df       MS              Number of obs =    2543
-------------+------------------------------           F(  3,  2539) =   40.60
       Model |  1921.04819     3  640.349397           Prob > F      =  0.0000
    Residual |  40042.0625  2539  15.7708005           R-squared     =  0.0458
-------------+------------------------------           Adj R-squared =  0.0447
       Total |  41963.1107  2542  16.5079114           Root MSE      =  3.9712

------------------------------------------------------------------------------
        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      school |   .4648227   .0634931     7.32   0.000     .3403191    .5893263
        male |   1.403662   .1605335     8.74   0.000     1.088872    1.718452
       exper |   .0587847   .0332937     1.77   0.078     -.006501    .1240703
       _cons |   -.970499   .8024911    -1.21   0.227    -2.544103    .6031047
------------------------------------------------------------------------------

which is fine, but not much use to me for presentation purposes - I might just want to produce the t-statistics. When you only have 8 instances, it's obviously not much bother to hand input them to whatever Excel table I want etc. I could even use a decent text editor to select those columns.

However I would like the ability to write a script in something like PHP that could extract the columns as arrays. I would like this as a general skill rather than just writing a script to extract the info from the above output.

For the automated processes that I write I've always gotten by with simple PHP or even JavaScript but that's the extent of my programming knowledge.

Here's what I'd like to know:

What's a good language for gathering what I want from text files, assuming it's in some standard ASCII format?
Failing that, is there some way of getting somewhat messy text like that into Excel columns pretty quickly?

Thanks in advance.

km991148 · 17-02-2010 5:49pm

Well whatever lang you use its regular expressions you want to start working with;
About the above tho? Where are you getting that output from? If you can get it to csv format or something more suitable to automatic parsing, it would make life a whole lot easier..

ronivek · 17-02-2010 6:09pm

Perl is about the most powerful text processing (not limited to just text processing either) language out there.

Regular expressions are a way for extracting and matching text; but are a bit of an arcane art in some respects. They're implemented in pretty much any programming language you can think of in various forms.

There are also various Unix tools and so on depending on your platform of choice and/or requirements.

If you want to put in some time and effort to learn though I'd recommend regular expressions; as mentioned above they're to a large extent fairly implementation agnostic.

Time Magazine · 17-02-2010 6:34pm

km991148 wrote: »

Well whatever lang you use its regular expressions you want to start working with;
About the above tho? Where are you getting that output from? If you can get it to csv format or something more suitable to automatic parsing, it would make life a whole lot easier..

It's from statistics software. That's bog standard output and hacks exist to convert it to Excel/LaTeX etc, but such hacks don't work for the more obscure commands. I'd also like to be able to do it no matter what the output, statistical or otherwise.

RE's and Perl seem to be the way to go.

Thanks for the advice, ronivek.

amen · 17-02-2010 11:15pm

windows log parser might help.

you can write sql like queries against the data

E39MSport · 18-02-2010 9:51am

perl would be the RR to some but as ronevik says theres plenty you can do if you have access to a *nix platform or even an emulator. Using sed, awk etc and you can get great results especially in conjunction with a simple shell script that reads the file line by line while outputting your required data.

Skrynesaver · 18-02-2010 9:15pm

I reckon Perl is the easiest way to go.
is this a standard format (ie. output of a common program), if so search for the program name on CPAN, there may already be a module for processing the output.

sheesh · 22-02-2010 2:53pm

if the large spaces above are tabs you might be able to paste the while thing into word and convert it to a table quiet easily telling word to create cells from tabs (thre was an option to do that

alternatively look at a text editor like textpad it allows you to block select text so you could select some of the blocks manually.

that output looks like it could come from a database output is it possible for you to do queries yourself? if you knew what you needed it might be worth your while getting some queries done.

if it is a database again it could be output as xml this would give each of the bits of information context and allow you to easily program selections for yourself.

But as other people said perl would be the scripting language of choice for this kind of operation

How to select specific columns from a .txt

Comments