Sign in to follow this  
Followers 0
wrack

Duplicate File Finder

30 posts in this topic

G'day :)

 

I have been thinking of doing this for a very long time. I wanted to write a small utility which I can use to find duplicate files from a drive. I am sure you can find one online too but for me this is a fun project.

 

You can grab the latest version from http://www.codelake.com/downloads/DuplicateFileFinder.zip

 

Created in Visual Studio 2012, .NET Framework 4.5.1 using C#

 

I will keep adding more features to it as I find time. Please let me know if you have anything specific in mind and I shall try to fit it in in the development schedule.

 

Cheers :)

1 person likes this

Share this post


Link to post
Share on other sites

G'day :)

 

Ok new version is up. Get it from http://www.codelake.com/downloads/DuplicateFileFinder.1.2014.0612.0138.zip

 

* Add better filtering mechanism

* Save and Restore settings

* Ability to delete file by selecting it from the result set and clicking the delete key (Will be moved to Recycle Bin if possible)

 

Cheers :)

1 person likes this

Share this post


Link to post
Share on other sites

Cool I love .NET apps! :) I just tried it out. It found 40 duplicate files here :) Some things I noticed (which will most probably be improved as this is work in progress):

 

- I can't see the full path of the files because most of it is outside the app's boundaries due to the other info (file size, created, ...) being shown

- split pane cannot be resized (I tried this because of the previous issue)

- the minimum size of the window is rather big

 

Is this programmed in VB.NET or C#?

 

Edit: woops, the split pane can be resized, it just has a minimum width. And I was too tired to notice the horizontal scrollbar, my bad :)

 

Thanks for sharing and keep up the good work!

Share this post


Link to post
Share on other sites

Thanks Raphael.

 

It is done in C#.

 

* Minimum size of the window is 960px X 720px. I can make it a little smaller but I figured, everyone would have a larger screen than that.

* Split pane's minimum size is 300. I can make it smaller.

* I am planning to shuffle the File Path column before the Size and Date.

 

Any other suggestions?

Share this post


Link to post
Share on other sites

Any chance for an open source for inquiring minds (ie. me)?

Share this post


Link to post
Share on other sites
astropheed, on 16 Jun 2014 - 13:25, said:astropheed, on 16 Jun 2014 - 13:25, said:

Any chance for an open source for inquiring minds (ie. me)?

Happy to share almost all of the code but I have a small company and this is the part of utility suite of applications so I am not sure of the legal implications (in future). The utility suite is free anyways so I don't mind sharing the snippets of the code if you want. From your other thread, http://www.neowin.net/forum/topic/1216223-going-to-c-from-heavy-python-experience/ I can see you want to get your head around the UI bit so happy to help there too.

1 person likes this

Share this post


Link to post
Share on other sites

Happy to share almost all of the code but I have a small company and this is the part of utility suite of applications so I am not sure of the legal implications (in future). The utility suite is free anyways so I don't mind sharing the snippets of the code if you want. From your other thread, http://www.neowin.net/forum/topic/1216223-going-to-c-from-heavy-python-experience/ I can see you want to get your head around the UI bit so happy to help there too.

 

Ah no worries, don't want to make things difficult.

Share this post


Link to post
Share on other sites

Ah no worries, don't want to make things difficult.

Not making it difficult at all. Just ask how certain things were done and I shall post the answer :)

Just can't post the whole thing together, if you know what I mean :D

Share this post


Link to post
Share on other sites

Just as a heads up as I don't want you running into any legal issues but there is this company who do the same thing with the same file name http://www.ashisoft.com/

Share this post


Link to post
Share on other sites
Skiver, on 16 Jun 2014 - 21:04, said:

Just as a heads up as I don't want you running into any legal issues but there is this company who do the same thing with the same file name http://www.ashisoft.com/

Thanks :)

 

I have already checked. There are heaps more with the same name so I don't think it would be an issue unless someone has a trademark on that name.

Share this post


Link to post
Share on other sites

New version (1.2014.0617.0015) is up. Get it at http://www.codelake.com/downloads/DuplicateFileFinder.zip

 

* Make the minimum size of the folder tree smaller.

* Removed the final message dialog and now showing information in the status bar.
* Exclude Recycle Bin and System Volume Information folders while scanning.

Share this post


Link to post
Share on other sites

New version (1.2014.0619.0547) is up.

Get it at http://www.codelake.com/downloads/DuplicateFileFinder.zip

* Reduced memory footprint while scanning large number of files especially when there are some filters in place.

* Better performance while using filters.

* Ability to sort the result list.

Share this post


Link to post
Share on other sites

New version (1.2014.0629.0832) is up.

 

Get it at http://www.codelake.com/downloads/DuplicateFileFinder.zip

 

* Add ability to include file extension so now you can search for specific file types such as JPG, PNG etc.

Share this post


Link to post
Share on other sites

I tried the last build. Very good! I like the smart help too (I'm not sure it is new, I just noticed it now).

A nice to have: "Go to location" when right clicking on a file. I know there is an option to copy the full path, but "go to" would eliminate the need to open explorer and paste the path.

Share this post


Link to post
Share on other sites

Share this post


Link to post
Share on other sites

New version (1.2014.0703.0246) is up.

 

Get it http://www.codelake.com/downloads/DuplicateFileFinder.zip

 

* Add option to open file location in the right click menu
* Add option to choose the combination of properties when performing simple search
* Add option of intelligent hash matching when performing advanced search
* Add option to specify normal file search filter
* Add option to specify regular expression file search filter
* Add option to just find files based on the filter options (Not looking for duplicates)
* Add smart help button and other cosmetic changes

Share this post


Link to post
Share on other sites

I have just started using your utility and I liked the interface, will let you know my more suggestions.

 

You can save a lot of time required for initial scan by using MFT as it already has most of the data you need, just like an index.

It maybe difficult to implement the support for that though, but it's worth the effort considering the no. of files we have these days.

 

Just check this WizTree software, it can also save MFT to a file.
https://antibody-software.com/web/software/software/wiztree-finds-the-files-and-folders-using-the-most-disk-space-on-your-hard-drive/

I was able to get disk usage of 4 TB disk with approx. 1 million files in 1 min. Normally it would take lot of time.

There are few other software which make use of MFT as well.

 

While writing this post, it threw up an exception during scan as below and exited.

----- < CodeLake Exception Message > --------------------------------------------
An unexpected error has occurred. Application.ThreadException


----- < CodeLake Exception Details > --------------------------------------------
System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Windows.Forms.Control.MarshaledInvoke(Control caller, Delegate method, Object[] args, Boolean synchronous)
   at System.Windows.Forms.Control.Invoke(Delegate method, Object[] args)
   at CodeLake.Utilities.DuplicateFileFinder.FormMain.a(Object A_0, RunWorkerCompletedEventArgs A_1)
   at System.ComponentModel.BackgroundWorker.OnRunWorkerCompleted(RunWorkerCompletedEventArgs e)
   at System.ComponentModel.BackgroundWorker.AsyncOperationCompleted(Object arg)

I ran it few times after posting this message initially.

I noticed that it throws exception exactly after finishing the initial scan or just before it is going to find the duplicate files.

I tried running it as admin and normal it failed the same way.

Edited by StarkWiz

Share this post


Link to post
Share on other sites

Will try this!  Thanks mate.

Share this post


Link to post
Share on other sites
StarkWiz, on 03 Jul 2014 - 16:59, said:

I have just started using your utility and I liked the interface, will let you know my more suggestions.

 

You can save a lot of time required for initial scan by using MFT as it already has most of the data you need, just like an index.

It maybe difficult to implement the support for that though, but it's worth the effort considering the no. of files we have these days.

 

Just check this WizTree software, it can also save MFT to a file.

https://antibody-software.com/web/software/software/wiztree-finds-the-files-and-folders-using-the-most-disk-space-on-your-hard-drive/

I was able to get disk usage of 4 TB disk with approx. 1 million files in 1 min. Normally it would take lot of time.

There are few other software which make use of MFT as well.

 

While writing this post, it threw up an exception during scan as below and exited.

----- < CodeLake Exception Message > --------------------------------------------
An unexpected error has occurred. Application.ThreadException


----- < CodeLake Exception Details > --------------------------------------------
System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Windows.Forms.Control.MarshaledInvoke(Control caller, Delegate method, Object[] args, Boolean synchronous)
   at System.Windows.Forms.Control.Invoke(Delegate method, Object[] args)
   at CodeLake.Utilities.DuplicateFileFinder.FormMain.a(Object A_0, RunWorkerCompletedEventArgs A_1)
   at System.ComponentModel.BackgroundWorker.OnRunWorkerCompleted(RunWorkerCompletedEventArgs e)
   at System.ComponentModel.BackgroundWorker.AsyncOperationCompleted(Object arg)

I ran it few times after posting this message initially.

I noticed that it throws exception exactly after finishing the initial scan or just before it is going to find the duplicate files.

I tried running it as admin and normal it failed the same way.

 

Thanks for that exception report. Could you please do 1 more testing on that same set of folder(s) but this time just click on Find Files? If that works then the error is most probably due to multithreading I introduced. Will be hard to debug but I shall do my best.

Share this post


Link to post
Share on other sites
Nashy, on 03 Jul 2014 - 17:08, said:

Will try this!  Thanks mate.

 

Welcome :)

Share this post


Link to post
Share on other sites

I got an error as well. Very nice piece of work!

----- < CodeLake Exception Message > --------------------------------------------
An unexpected error has occurred. Application.ThreadException


----- < CodeLake Exception Details > --------------------------------------------
System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Windows.Forms.Control.MarshaledInvoke(Control caller, Delegate method, Object[] args, Boolean synchronous)
   at System.Windows.Forms.Control.Invoke(Delegate method, Object[] args)
   at CodeLake.Utilities.DuplicateFileFinder.FormMain.a(Object A_0, RunWorkerCompletedEventArgs A_1)
   at System.ComponentModel.BackgroundWorker.OnRunWorkerCompleted(RunWorkerCompletedEventArgs e)
   at System.ComponentModel.BackgroundWorker.AsyncOperationCompleted(Object arg)

My system specs:

Computer:      GIGABYTE X58A-UD3R
CPU:           Intel Core i7-980 (Gulftown, B1)
               3333 MHz (25.00x133.3) @ 3899 MHz (27.00x144.4)
Motherboard:   GIGABYTE X58A-UD3R
Chipset:       Intel X58 (Tylersburg 36S) + ICH10R
Memory:        24576 MBytes @ 722 MHz, 10.0-9-9-24
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
               - 4096 MB PC10600 DDR3 SDRAM - A-DATA Technology 
Graphics:      EVGA e-GeForce GTX 460 SE
               NVIDIA GeForce GTX 460 SE, 1024 MB GDDR5 SDRAM
Graphics:      EVGA e-GeForce GTX 460 SE
               NVIDIA GeForce GTX 460 SE, 1024 MB GDDR5 SDRAM
Drive:         Hitachi HDT721010SLA360, 976.8 GB, Serial ATA 3Gb/s
Drive:         ST2000DL003-9VT166, 1953.5 GB, Serial ATA 6Gb/s @ 3Gb/s
Drive:         WDC WD1002FAEX-00Z3A0, 976.8 GB, Serial ATA 6Gb/s
Drive:         HP DVD Writer 300n, DVD+R Writer
Sound:         Intel ICH10 - High Definition Audio Controller [A0]
Sound:         NVIDIA GF104 - High Definition Audio Controller
Sound:         NVIDIA GF104 - High Definition Audio Controller
Network:       RealTek Semiconductor RTL8168/8111 PCI-E Gigabit Ethernet NIC
OS:            Microsoft Windows 8.1 Professional (x64) Build 9600
 

Share this post


Link to post
Share on other sites

Thanks for that exception report. Could you please do 1 more testing on that same set of folder(s) but this time just click on Find Files? If that works then the error is most probably due to multithreading I introduced. Will be hard to debug but I shall do my best.

Got exactly same error.

I tried on different drives and noticed that I dont get this error on drives with less no. of files.

The drive on which I got error has around 82k files.

Share this post


Link to post
Share on other sites

Thanks for the bug report guys. I have found it to be due to the multithreading for sure but I can't reproduce it and so very hard to debug. For the time being, I am reverting that code change.

 

New version (1.2014.0703.2350) is up.

 

Get it http://www.codelake.com/downloads/DuplicateFileFinder.zip

 

* Removed multithreading due to reliability issues.
* Change the min and max size options and made them independent of one another.

Share this post


Link to post
Share on other sites

Got exactly same error.

I tried on different drives and noticed that I dont get this error on drives with less no. of files.

The drive on which I got error has around 82k files.

Thanks for the stats.

 

I ran advanced search on a 36.9GB single folder with 436639 files and 61814 subfolder and it took 1 hour to find me all duplicate files but no errors so far. I have reverted the multithreading code anyways so shouldn't be an issue for now.

 

Apologies for any troubles caused by this.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Recently Browsing   0 members

    No registered users viewing this page.