Re: read ahead or before



Ted Davis wrote:
On Sat, 26 Jul 2008 12:02:48 -0700, Mag Gam wrote:


I have been trying to do this instead of placing everything in a hash/
array and compare in the END block.

For example, if I have a file like this

111
2222
333
333
4445
3434

Notice there is a duplicate "333". How can I test if the next line is
the same as the current line? I suppose I can use getline() but is there
another clever way of achieving this?

Also, how can I check for previous line?


Functionally, this is the same as PK's suggestion, it's just written out
in a fuller (C-like), and hopefully, clearer, form - since you didn't say
what you want to do with the lines after suppressing adjacent duplicates,
I wrote it to print the non-duplicate lines as it encounters them. This
should not be sensitive to the file size because it stores only one line
at a time.

{
if( $0 != Prev ) print $0
Prev = $0
}

In minimalist awk format, that's
$0 != Prev {print}
{Prev = $0}


As a command line program that could be (minimalist format)

awk '$0!=Prev{print}{Prev=$0}' source > target

If we're going to go minimalist, maybe even...

awk '$0!=prev;{prev=$0}' source > target


Janis


(tested under Fedora and XP (as a script file - all variations tested
under Linux) with your sample data)

BTW, "gigabytes" is usually abbreviated GB (Gb would be "gigabits").
Abbreviations for SI prefixes for units larger than kilo are all upper
case - all those smaller than mega are in lower case - the full prefixes
are in lower case unless the language requires initial capitals (k and K
have an unofficial byte/bit context usage: k = 1000; K = 1024).

.



Relevant Pages

  • Re: read ahead or before
    ... Notice there is a duplicate "333". ... In minimalist awk format, that's ... $0!= Prev {print} ... Abbreviations for SI prefixes for units larger than kilo are all upper ...
    (comp.lang.awk)
  • Re: Unique Rankings
    ... > Please accept my second apology! ... I have some duplicate values in my data, ... > yet I need unique ranking numbers. ... Prev by Date: ...
    (microsoft.public.excel.misc)
  • Re: Formula that recognizes duplicates
    ... >> I need a formula that tells me if there is a duplicate within the same ... I am wanting to enter in ... >> Product ID numbers and if there is a duplciate I need to be warned. ... Prev by Date: ...
    (microsoft.public.excel.worksheet.functions)
  • Question about changing executable name in VS.Net
    ... actually load from both VS.Net and running the exe outside of VS.Net ... It seems to sit on the following line, where I check for duplicate ... LDD ... Prev by Date: ...
    (microsoft.public.dotnet.languages.vb)
  • Re: Sumproduct and wildcards
    ... Do i need to be worried about the loss of ... Regards, ... Prev by Date: ...
    (microsoft.public.excel.worksheet.functions)