Thursday, November 26, 2009

I want a unique document

I've recently needed to compare some data in two text documents. Unfortunately while there are some great diff tools like WinMerge available, I also discovered that the files had duplicate lines of data which a) was a mistake and b) made the diff results a great deal larger / more complex.
While some linux distro's have sort or unique commands, windows doesn't seem to have an equivalent CLI utility - so I was going to either go hunting or write a q&d vbscript... Then I remembered Powershell.
It turns out that PowerShell has a handy cmd-let called 'get-unique' - it needs a sorted list (so if it's critical your file remain in original order this example won't help). But a simple one liner resolved the duplicate problem for me:
get-content .\filename | sort | get-unique > .\outfile.txt
OR
get-content .\filename | sort -unique > .\outfile.txt
The above command gets the file, pipes to the sort cmd-let to sort the file, then passes the sorted results to the get-unique cmd-let, then passes the output to outfile.txt. The second version is an alternative that uses the unique flag of the sort cmd-let to unique-ify the output in one...
Very quick, easy, one line fix...

No comments:

Post a Comment