Re: compare values in the same field in consecutive rows--and store the result in an array



On Jan 28, 9:56 am, Ed Morton <mor...@xxxxxxxxxxxxxx> wrote:
On 1/24/2008 8:56 PM, z.entropic wrote:



On Jan 24, 5:40 pm, Ed Morton <mor...@xxxxxxxxxxxxxx> wrote:

On 1/24/2008 4:35 PM, z.entropic wrote:

On Jan 24, 3:33 pm, Ed Morton <mor...@xxxxxxxxxxxxxx> wrote:

On 1/24/2008 2:25 PM, z.entropic wrote:

On Jan 24, 2:41 pm, Ed Morton <mor...@xxxxxxxxxxxxxx> wrote:

On 1/24/2008 1:19 PM, z.entropic wrote:
<snip>

Here is a fragment of my input file:

============
100        24479.33        14399.09        1/23/2008 19:55 6       1

  0       0       3.293   1.287>>>>>>>101        24480.25        14400.01        1/23/2008 19:55 6       1

  0       0       3.296   1.288>>>>>>>102        24480.36        0.11    1/23/2008 19:55 7       1       0

  -0.00185954     3.167   1.287





=============

Thus, if field $6 in line 100 is equal to 6 AND field $6 in line 101
is equal to 7, store the value 14399.09 in array1[1] and 1.287 in
array2[1]. When the next match is found, store the two values in
array1[2] and array2[2], etc.  Basically, I'm comparing the same
fields in consecutive rows.

Try this:

awk '($6==7)&&(p6==6){array1[++n]=p3;array2[n]=p11} {p=$6;p3=$3;p11=$11}' file

You should use an array for the "p" (previous) field values if you to need to
access more of them.

     Ed.

I think my example is a bit confusing due to poor formatting (copying

from Excel with the wrong date format didn't help...)  In essence, I

can't get the script working even after some changes etc., so let me
explain again as best as I can.  Here is an interesting section frm
one of my data files, this time with proper formatting that awk would
see (tab-separated fields):

100 24479.32 14399.08 1/23/2008 7:55:39 PM 6 1 0  0          3.293399
101 24480.25 14400.01 1/23/2008 7:55:40 PM 6 1 0  0          3.293234
102 24480.36     0.10 1/23/2008 7:55:41 PM 7 1 0 -0.00185954 3.166826
103 24480.46     0.21 1/23/2008 7:55:41 PM 7 1 0 -0.00185932 3.034836

Simply put, I want to find pairs of lines in which the counter in
field $7 changes, here from 6 to 7, and then store in array array1[1]
the value found in field $11 (3.293234, line 101). The next pair of
found lines would change the array counter to 2 (array[2]).

So now we're back to one array? ok, look:

$ awk '($7==7)&&(p7==6){array[++n]=p11} {p7=$7;p11=$11} END{for (i in array)
print i, array[i]}' file
1 3.293234

Once I figure out with your help how to do that, I'll try to expand
this script to store more values, including some from line 102 in the
example above.

If the above still isn't what you're looking for either, maybe posting a little
more sample input and some expected output would help.

      Ed.- Hide quoted text -

- Show quoted text -

Your script works--in part, probably because I underspecified the
requirements.  I think the problem is a bit more complex; I'll provide
a larger  example of input and output.

OK, but if it's just that you want to get output every time the 7th field
changes rather than when it specifically changes from 6 to 7, then all you'd
need is:

$ awk 'p7&&($7!=p7){array[++n]=p11} {p7=$7;p11=$11} END{for (i in array) print i
, array[i]}' file
1 3.293234

so also see if that's what you're really looking for....

I believe this kind of a problem may be of interest to a wider group
of readers and awk users as it concerns data extraction and processing
that I, at least, often encounter.

z.e.- Hide quoted text -

- Show quoted text -- Hide quoted text -

- Show quoted text -

I think this is the closes so far to my goal--the $11 values printed
out are those I am after, and the lines I'm interested in always are
those where one of the fields, a loop counter of sorts, changes a
value. Now, three questions on the modification of the latest script
to expand its functionality:

1. how to store the value in an aditional field, e.g., $10, and print
it out on the same line?  I've tried

awk 'p7&&($7!=p7){V[++n]=p11}{c[++m]==p10} {p7=$7;p10=$10;p11=$11} END

ITYM c[++m]=p10 instead of c[++m]==p10.

{for (i in V) print i, V[i],c[i]}'

but obviously this ex[pression doesn't work as intended (the for loop
is incomplete...)  Should I use two independent loops and a \n at the
end of the first statement?  The n and m indices are always the same,
but I can't use n twice as its increases in both expressions...

If you don't want n to increase twice, just don't increment it twice:

awk 'p7&&($7!=p7){V[++n]=p11;c[n]=p10} {p7=$7;p10=$10;p11=$11}
END{for (i in V) print i, V[i],c[i]}'



2. how could I store and print out $11 from the next line (with an
already changed $7?)

awk 'p7&&($7!=p7){V[++n]=p11;c[n]=p10;d[n]=$11} {p7=$7;p10=$10;p11=$11}
END{for (i in V) print i, V[i],c[i],d[i]}'

3. I'd like to store and print the source FILENAME on each line.

awk 'p7&&($7!=p7){V[++n]=p11;c[n]=p10;d[n]=$11,e[n]=FILENAME}
{p7=$7;p10=$10;p11=$11}
END{for (i in V) print i, V[i],c[i],d[i],e[i]}'

but you don't need to store it if it's just one input file:

awk 'p7&&($7!=p7){V[++n]=p11;c[n]=p10;d[n]=$11} {p7=$7;p10=$10;p11=$11}
END{for (i in V) print i, V[i],c[i],d[i],FILENAME}'

4. I'd like to skip the first 5 or 10 lines (I think I know how to do
that...)

awk 'NR<=10{next}
p7&&($7!=p7){V[++n]=p11;c[n]==p10;d[n]=$11,e[n]=FILENAME} {p7=$7;p10=$10;p11=$11}
END{for (i in V) print i, V[i],c[i],d[i],e[i]}'

5. I assume that if I wanted more complex conditions, I could combine
them as in

awk '(p7&&($7!=p7))&&(p8&&($8!=p8))...'

but what if I'd like to use $8 on the next line, with a changed value
of $7?

I don't know what you mean by that.

Hmmm... this is getting more complex than I initially expected...

Just follow the pattern....

        Ed.- Hide quoted text -

- Show quoted text -- Hide quoted text -

- Show quoted text -

I took your latest script, cleaned it up a bit for clarity, changed
some letters (to make them more meaningful for me during the debugging
and learning process--and it almost works the way I would want it to
work!

( NR < 8 ) && ( $7 < 6 ) { next } s7 && ( $7 != s7 ) { V[++n] = V11;
c[n] = c10; U[n] = $11; f[n] = FILENAME } { s7 = $7; c10 = $10; V11 =
$11 } END { for (i in V ) print i, f[i], c[i], V[i], U[i] }

However, I still have a few problems:

1. the first two inequalities seem to be disregarded, and unwanted
data are stored in the array and then printed out.
2. the data are printed out in reverse order (from the highest i to
1... how come?
3. how to impose an i>6 condition in the last 'for' printout loop (see
#1).

z.e.
.



Relevant Pages