How to extract difference between two files?

Say that you have two files file1.txt and file2.txt as shown below:

file1.txt file2.txt
satish
devarapalli
java
linux
memory
slf4j
osgi
tomcat
apache
axis2
satish
apache
slf4j
osgi
java
linux
axis2
tomcat

Using commands sort and comm, we can identify the lines that are common in both files, lines unique in file1 and lines unique in file2. Below are the steps:

  1. Command comm works on sorted files, so the first step is to sort both file1.txt and file2.txt
    • sort file1.txt > file1_sorted.txt
    • sort file2.txt > file2_sorted.txt
  2. Find lines common to both files
    • comm -12 file1_sorted.txt file2_sorted.txt  | nl
      • Option “-12” means suppress unique lines in first (file1_sorted.txt) and second (file2_sorted.txt) files.
      • “| nl” — add line numbers to output
  3. Find lines unique in file2_sorted.txt
    • comm -13 file1_sorted.txt file2_sorted.txt  | nl
      • Option “-13” means suppress unique lines in first (file1_sorted.txt) file and lines common in both the files (3 – lines common in both files)
  4. Find lines unique in file1_sorted.txt
    • comm -23 file1_sorted.txt file2_sorted.txt  | nl
      • Option “-23” means suppress unique lines in second (file2_sorted.txt) file and lines common in both the files (3 – lines common in both files

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s