Duplicate File Finder


Recommended Posts

G'day :)

 

I have been thinking of doing this for a very long time. I wanted to write a small utility which I can use to find duplicate files from a drive. I am sure you can find one online too but for me this is a fun project.

 

You can grab the latest version from http://www.codelake.com/downloads/DuplicateFileFinder.zip

 

Created in Visual Studio 2012, .NET Framework 4.5.1 using C#

 

I will keep adding more features to it as I find time. Please let me know if you have anything specific in mind and I shall try to fit it in in the development schedule.

 

Cheers :)

Link to comment
Share on other sites

G'day :)

 

Ok new version is up. Get it from http://www.codelake.com/downloads/DuplicateFileFinder.1.2014.0612.0138.zip

 

* Add better filtering mechanism

* Save and Restore settings

* Ability to delete file by selecting it from the result set and clicking the delete key (Will be moved to Recycle Bin if possible)

 

Cheers :)

Link to comment
Share on other sites

Cool I love .NET apps! :) I just tried it out. It found 40 duplicate files here :) Some things I noticed (which will most probably be improved as this is work in progress):

 

- I can't see the full path of the files because most of it is outside the app's boundaries due to the other info (file size, created, ...) being shown

- split pane cannot be resized (I tried this because of the previous issue)

- the minimum size of the window is rather big

 

Is this programmed in VB.NET or C#?

 

Edit: woops, the split pane can be resized, it just has a minimum width. And I was too tired to notice the horizontal scrollbar, my bad :)

 

Thanks for sharing and keep up the good work!

Link to comment
Share on other sites

Thanks Raphael.

 

It is done in C#.

 

* Minimum size of the window is 960px X 720px. I can make it a little smaller but I figured, everyone would have a larger screen than that.

* Split pane's minimum size is 300. I can make it smaller.

* I am planning to shuffle the File Path column before the Size and Date.

 

Any other suggestions?

Link to comment
Share on other sites

astropheed, on 16 Jun 2014 - 13:25, said:astropheed, on 16 Jun 2014 - 13:25, said:

Any chance for an open source for inquiring minds (ie. me)?

Happy to share almost all of the code but I have a small company and this is the part of utility suite of applications so I am not sure of the legal implications (in future). The utility suite is free anyways so I don't mind sharing the snippets of the code if you want. From your other thread, https://www.neowin.net/forum/topic/1216223-going-to-c-from-heavy-python-experience/ I can see you want to get your head around the UI bit so happy to help there too.

Link to comment
Share on other sites

Happy to share almost all of the code but I have a small company and this is the part of utility suite of applications so I am not sure of the legal implications (in future). The utility suite is free anyways so I don't mind sharing the snippets of the code if you want. From your other thread, https://www.neowin.net/forum/topic/1216223-going-to-c-from-heavy-python-experience/ I can see you want to get your head around the UI bit so happy to help there too.

 

Ah no worries, don't want to make things difficult.

Link to comment
Share on other sites

Ah no worries, don't want to make things difficult.

Not making it difficult at all. Just ask how certain things were done and I shall post the answer :)

Just can't post the whole thing together, if you know what I mean :D

Link to comment
Share on other sites

Skiver, on 16 Jun 2014 - 21:04, said:

Just as a heads up as I don't want you running into any legal issues but there is this company who do the same thing with the same file name http://www.ashisoft.com/

Thanks :)

 

I have already checked. There are heaps more with the same name so I don't think it would be an issue unless someone has a trademark on that name.

Link to comment
Share on other sites

  • 2 weeks later...

I tried the last build. Very good! I like the smart help too (I'm not sure it is new, I just noticed it now).

A nice to have: "Go to location" when right clicking on a file. I know there is an option to copy the full path, but "go to" would eliminate the need to open explorer and paste the path.

Link to comment
Share on other sites

Rapha?l, on 01 Jul 2014 - 06:58, said:

I tried the last build. Very good! I like the smart help too (I'm not sure it is new, I just noticed it now).

A nice to have: "Go to location" when right clicking on a file. I know there is an option to copy the full path, but "go to" would eliminate the need to open explorer and paste the path.

 

Thanks. The smart help has always been there but not very visual. I intend to make it more visual though.

 

Yes I am planning to add "Go To" rather than just copy.

Link to comment
Share on other sites

New version (1.2014.0703.0246) is up.

 

Get it http://www.codelake.com/downloads/DuplicateFileFinder.zip

 

* Add option to open file location in the right click menu
* Add option to choose the combination of properties when performing simple search
* Add option of intelligent hash matching when performing advanced search
* Add option to specify normal file search filter
* Add option to specify regular expression file search filter
* Add option to just find files based on the filter options (Not looking for duplicates)
* Add smart help button and other cosmetic changes

Link to comment
Share on other sites

I have just started using your utility and I liked the interface, will let you know my more suggestions.

 

You can save a lot of time required for initial scan by using MFT as it already has most of the data you need, just like an index.

It maybe difficult to implement the support for that though, but it's worth the effort considering the no. of files we have these days.

 

Just check this WizTree software, it can also save MFT to a file.
https://antibody-software.com/web/software/software/wiztree-finds-the-files-and-folders-using-the-most-disk-space-on-your-hard-drive/

I was able to get disk usage of 4 TB disk with approx. 1 million files in 1 min. Normally it would take lot of time.

There are few other software which make use of MFT as well.

 

While writing this post, it threw up an exception during scan as below and exited.

----- < CodeLake Exception Message > --------------------------------------------
An unexpected error has occurred. Application.ThreadException


----- < CodeLake Exception Details > --------------------------------------------
System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Windows.Forms.Control.MarshaledInvoke(Control caller, Delegate method, Object[] args, Boolean synchronous)
   at System.Windows.Forms.Control.Invoke(Delegate method, Object[] args)
   at CodeLake.Utilities.DuplicateFileFinder.FormMain.a(Object A_0, RunWorkerCompletedEventArgs A_1)
   at System.ComponentModel.BackgroundWorker.OnRunWorkerCompleted(RunWorkerCompletedEventArgs e)
   at System.ComponentModel.BackgroundWorker.AsyncOperationCompleted(Object arg)

I ran it few times after posting this message initially.

I noticed that it throws exception exactly after finishing the initial scan or just before it is going to find the duplicate files.

I tried running it as admin and normal it failed the same way.

Edited by StarkWiz
Link to comment
Share on other sites

StarkWiz, on 03 Jul 2014 - 16:59, said:

I have just started using your utility and I liked the interface, will let you know my more suggestions.

 

You can save a lot of time required for initial scan by using MFT as it already has most of the data you need, just like an index.

It maybe difficult to implement the support for that though, but it's worth the effort considering the no. of files we have these days.

 

Just check this WizTree software, it can also save MFT to a file.

https://antibody-software.com/web/software/software/wiztree-finds-the-files-and-folders-using-the-most-disk-space-on-your-hard-drive/

I was able to get disk usage of 4 TB disk with approx. 1 million files in 1 min. Normally it would take lot of time.

There are few other software which make use of MFT as well.

 

While writing this post, it threw up an exception during scan as below and exited.

----- < CodeLake Exception Message > --------------------------------------------
An unexpected error has occurred. Application.ThreadException


----- < CodeLake Exception Details > --------------------------------------------
System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Windows.Forms.Control.MarshaledInvoke(Control caller, Delegate method, Object[] args, Boolean synchronous)
   at System.Windows.Forms.Control.Invoke(Delegate method, Object[] args)
   at CodeLake.Utilities.DuplicateFileFinder.FormMain.a(Object A_0, RunWorkerCompletedEventArgs A_1)
   at System.ComponentModel.BackgroundWorker.OnRunWorkerCompleted(RunWorkerCompletedEventArgs e)
   at System.ComponentModel.BackgroundWorker.AsyncOperationCompleted(Object arg)

I ran it few times after posting this message initially.

I noticed that it throws exception exactly after finishing the initial scan or just before it is going to find the duplicate files.

I tried running it as admin and normal it failed the same way.

 

Thanks for that exception report. Could you please do 1 more testing on that same set of folder(s) but this time just click on Find Files? If that works then the error is most probably due to multithreading I introduced. Will be hard to debug but I shall do my best.

Link to comment
Share on other sites

I got an error as well. Very nice piece of work!

----- < CodeLake Exception Message > --------------------------------------------
An unexpected error has occurred. Application.ThreadException


----- < CodeLake Exception Details > --------------------------------------------
System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Windows.Forms.Control.MarshaledInvoke(Control caller, Delegate method, Object[] args, Boolean synchronous)
   at System.Windows.Forms.Control.Invoke(Delegate method, Object[] args)
   at CodeLake.Utilities.DuplicateFileFinder.FormMain.a(Object A_0, RunWorkerCompletedEventArgs A_1)
   at System.ComponentModel.BackgroundWorker.OnRunWorkerCompleted(RunWorkerCompletedEventArgs e)
   at System.ComponentModel.BackgroundWorker.AsyncOperationCompleted(Object arg)

My system specs:

Computer:      GIGABYTE X58A-UD3R
CPU:           Intel Core i7-980 (Gulftown, B1)
               3333 MHz (25.00x133.3) @ 3899 MHz (27.00x144.4)
Motherboard:   GIGABYTE X58A-UD3R
Chipset:       Intel X58 (Tylersburg 36S) + ICH10R
Memory:        24576 MBytes @ 722 MHz, 10.0-9-9-24
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
Graphics:      EVGA e-GeForce GTX 460 SE
               NVIDIA GeForce GTX 460 SE, 1024 MB GDDR5 SDRAM
Graphics:      EVGA e-GeForce GTX 460 SE
               NVIDIA GeForce GTX 460 SE, 1024 MB GDDR5 SDRAM
Drive:         Hitachi HDT721010SLA360, 976.8 GB, Serial ATA 3Gb/s
Drive:         ST2000DL003-9VT166, 1953.5 GB, Serial ATA 6Gb/s @ 3Gb/s
Drive:         WDC WD1002FAEX-00Z3A0, 976.8 GB, Serial ATA 6Gb/s
Drive:         HP DVD Writer 300n, DVD+R Writer
Sound:         Intel ICH10 - High Definition Audio Controller [A0]
Sound:         NVIDIA GF104 - High Definition Audio Controller
Sound:         NVIDIA GF104 - High Definition Audio Controller
Network:       RealTek Semiconductor RTL8168/8111 PCI-E Gigabit Ethernet NIC
OS:            Microsoft Windows 8.1 Professional (x64) Build 9600
 
Link to comment
Share on other sites

Thanks for that exception report. Could you please do 1 more testing on that same set of folder(s) but this time just click on Find Files? If that works then the error is most probably due to multithreading I introduced. Will be hard to debug but I shall do my best.

Got exactly same error.

I tried on different drives and noticed that I dont get this error on drives with less no. of files.

The drive on which I got error has around 82k files.

Link to comment
Share on other sites

Thanks for the bug report guys. I have found it to be due to the multithreading for sure but I can't reproduce it and so very hard to debug. For the time being, I am reverting that code change.

 

New version (1.2014.0703.2350) is up.

 

Get it http://www.codelake.com/downloads/DuplicateFileFinder.zip

 

* Removed multithreading due to reliability issues.
* Change the min and max size options and made them independent of one another.

Link to comment
Share on other sites

Got exactly same error.

I tried on different drives and noticed that I dont get this error on drives with less no. of files.

The drive on which I got error has around 82k files.

Thanks for the stats.

 

I ran advanced search on a 36.9GB single folder with 436639 files and 61814 subfolder and it took 1 hour to find me all duplicate files but no errors so far. I have reverted the multithreading code anyways so shouldn't be an issue for now.

 

Apologies for any troubles caused by this.

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.