Thursday, January 8, 2015

Find Duplicate Files in a Folder or Over Network:MD5 Checksum

Manually finding duplicate files in a folder or on a computer hard drive is not a easy task. We can automatically calculate the md5 hash of each and every file and then compare the md5 hash value. If the value of two files are same then we can say that the files are same.


Here, the question arises what is md5 hash value or what is checksum? md5 hash value or checksum is a small size datum from an arbitrary block of digital data.In a generic term we can say a fingerprint of a file. We can calculate the checksum hash for each and every file we want to check and by comparing the hash value we can easily find out the duplicate files.

We can also use this technique to check the integrity of the file if it transfer over the network. Suppose we want to copy a list of files from a server. How we will make sure the file copied is not corrupted during transition. Here, we can apply the checksum process to find if the files are not changed during transition.

Below the step by step process to calculate md5 checksum hash value of a file.

1. First of all we need to download a Microsoft  tool called fciv. click here to download the required fciv file. you can also google fciv download to download the file.

2. Install the .exe file which you have downloaded in step 1. It will prompt to select a location.

step 2

3. A file fciv.exe extracted at the location provided in step 2.

step 3

4. Open command prompt and change the directory to the location where you have kept the fciv.exe file.

step 4

5. execute the command: fciv [drive: ][path][file name] . A unique checksum value will be generated after executing the command. see the highlighted value in the below screen shot.

step 5

For each fciv call, get the first word of the last line, ex: 3fa858b65003387d38e5392b1743cce9

If you re-execute the command you will get the same value for the same file. If you re-execute the queries on several duplicate files you will get the same unique hash value. So by comparing the hash value you will be able to find the the duplicate file.

If the command print the same hash value for different files that means all the files are identical with same content.

You can also execute the command for a file in shared folder of the different computer. just we need provide the path of the shared folder.

In this way we can check

1. The duplicate files in a folder or on your computer.

2. We can check if the two files are same or not by comparing the hash value. If hash value is same for all the files then files are identical and if different hash value then the files are different.

Please let me know your queries, feedback in comment section. I will reply to your queries as early as possible.

You may like the post:

How to check the difference between two files using command prompt.