Re: online awk application - viable



On Jan 15, 4:43 pm, Ted Davis <tda...@xxxxxxx> wrote:
On Thu, 15 Jan 2009 05:18:52 -0800, theleocullen wrote:
Hi all,

Thank you for taking the time to read my post. I was wondering about the
feasibility of writing a web application to carry out pattern matching in
text files - basically what awk was designed to do, but with a browser
based interface to make it easy for even  those people who are
computer-phobic.

So the user would just browse to the file to be searched, select it, enter
the sought-after pattern in a text box, with radio-button options of
desired operation - replace, cut, etc, another text box for text to
replace pattern with,  click 'search' and bingo, done.  it I know it could
be implemented in php or another language but awk (which I have not used
much - yet!) seems well suited to the task .

I would be really interested in hearing your views, comments, derision,
etc.

If I were going to do something like that, I'd write a CGI script - in
awk/gawk - that would create an awk script, or select from a limited
selection matching the offered selections, perform a sanity test on the
pattern (passed in ENVIRON["QUERY_STRING"]), then shell the new script and
pass it the file (read from STDIN and saved in a temp file) sent from the
web page. The main CGI script would read the STDOUT from the shell, wrap
it in a suitable header (and perhaps some HTML for formatting), and then
return it to the browser (and delete the temp file).

If you want awk-like behavior, the easiest way to get it is to use awk.

I use gawk for almost all the CGI scripts I have occasion to write.
Wonderful things can be done fairly easily using gawk under Apache with
Firefox as the display engine.

The main problems are that the browser will escape some characters in the
pattern, and "&" can be tricky to handle.

I'm sure the people here would be happy to assist with solving the
problems - as a start, here's the code I use to unescape characters from a
script that returns everything the CGI process can know about its STDIN
and environment:

function UnEscape( String,  y, c, p ) {
# This function replaces % escape codes in HTML form returns with the
# characters they represent and replaces +s with spaces.
#
# The char() function will accept character numbers in the form \xHH, so
# we can unescape %HH substrings by isolating them, converting % to \x,
# running the result through char(), and finally gsubbing the resulting
# character for the original escape sequence.

# First, replace + with space
        gsub( /\+/, " ", String )
# Another special case: newlines can be %0D%0A, %0D, %0A, or &.
    gsub( /%0d%0a/, "\n", String )
    gsub( /%0a/, "\n", String )
    gsub( /%0d/, "\n", String )
    gsub( /%0D%0A/, "\n", String )
    gsub( /%0A/, "\n", String )
    gsub( /%0D/, "\n", String )
    gsub( /&/,  "\n", String )

# Convert %nn into the escaped hex number \xnn, then convert the number
# to decimal, and the decimal value to a character which replaces the
# original escape sequence.  Newline escape sequences have already been
# taken care of.
        while( String ~ /%[0-9A-Fa-f][0-9A-Fa-f]/ ) {
            sub( /%/, "\\x", String )
        if( match(String, /\\x[0-9A-Fa-f][0-9A-Fa-f]/) ) {
            y = substr( String, RSTART, RLENGTH )
            p = HexToDec( y )
            c = chr( p )
            sub(/\\x[0-9A-Fa-f][0-9A-Fa-f]/, c, String )
        }
        }

        return( String )

}

function chr(C)
{
# C is the decimal numeric value of a character.
# force C to be numeric by adding 0
    return( sprintf("%c", C + 0) )

}

function HexToDec( Hex,   d1, d2, Scratch ) {
# Hex is a string in the form \xnn where n is in the set 0-9a-f.

# Simple enough: the index of a digit in a 0-f string is one more than its
# decimal value.  It's easy enough to strip off the leading \x and split into
# two separate characters - the decimal value of thefirst gets multiplied by
# 16 and added to the value of the second.
    Hex = tolower( Hex )
    sub( /\\x/, "", Hex )
    split( Hex, Scratch, "" )
    d1 = Scratch[ 1 ]
    d2 = Scratch[ 2 ]
    d1 = (index("0123456789abcdef", d1) - 1 ) * 16
    d2 = index("0123456789abcdef", d2) - 1
    return( d1 + d2 )

}

I don't claim those to be optimum, but they have served me well for
several years.

--

T.E.D. (tda...@xxxxxxx) MST (Missouri University of Science and Technology)
used to be UMR (University of Missouri - Rolla).

Wow! Thanks Ted,
what a great reply, thanks so much for taking the time to write such a
detailed response and the cool functions.
.