• 0

[C++] Test if two files are the same?


Question

I'm not looking to test if two files are duplicates, I know you can do that by doing a byte by byte comparison, or hashing both files.

What I'm looking for is a way to test if both files are literally the same file. For example:

c:\documents and settings\bla.txt

c:\docume~1\bla.txt

When comparing the strings, those might be seen as two different files, when really they're the same file. I could convert both strings to short file names, but I'm not sure if Windows has other ways of linking files to eachother.

In brief, I need a foolproof way to test if two files are the same file or different.

Link to comment
https://www.neowin.net/forum/topic/349791-c-test-if-two-files-are-the-same/
Share on other sites

21 answers to this question

Recommended Posts

  • 0

Well i can just give you puesudo code:

Read both files and store the text inside, into two seperate strings

--> If string 1 == string 2

Do what you want to happen

You will have to check yourself, or let someone else find out how to read the files, since i haven't learned about reading files yet.

  • 0

So your question is how are you supposed to see if two files are identical without comparing contents of them!?

Btw, love your avatar - reminds me of when I was creating demos in DOS and I made effect that looked exactly as your avatar... heh..

  • 0

so let me see if i'm getting the question.

u have 1 file but have 2 different strings for the paths:

as in

Path 1 is C:\Docs\abc.txt

Path 2 is D:\Prog\Desktop\abc.txt

put the two paths in 2 string arrays. parse the arrays backwards till u reach the '\', from that point on again go forward till the end of the array and store this in 2 new arrays.

now compare these 2 arrays to chk if they are the same.

  • 0
  df_dukkar said:
so let me see if i'm getting the question.

u have 1 file but have 2 different strings for the paths:

as  in

Path 1 is C:\Docs\abc.txt

Path 2 is D:\Prog\Desktop\abc.txt

put the two paths in 2 string arrays. parse the arrays backwards till u reach the '\', from that point on again go forward till the end of the array and store this in 2 new arrays.

now compare these 2 arrays to chk if they are the same.

586278450[/snapback]

He wants to see if the files are the same, i.e. they contain the same stuff, but for some reason he doesn't want to open the file or uses hashes like an MD5sum.

  • 0

Sorry, let me reexplain :pinch:

c:\documents and settings\file.txt

c:\docume~1\file.txt

Technically, both are the same file, but doing a string comparison would say differently.

I have a function which has two parameters, inputfile and outputfile. If the outputfile is different than the inputfile, it will truncate the outputfile. But if the inputfile and outputfile are the same file, then it'll overwrite the current file.

I already mentioned that I could convert both strings to a short filename, but I'm unsure if there are other things to consider. For example, %systemroot%.

So I need a "foolproof" way of testing if the two strings both literally represent the same file. If converting to a short filename would be adequate, could someone give me a function that can do this? I was unable to find anything on Google or MSDN :blush:

And sorry for not being clearer in my first post :(

Btw, love your avatar - reminds me of when I was creating demos in DOS and I made effect that looked exactly as your avatar... heh..

My avatar is a XOR effect, (X ^ Y) ;)

Edit: Just to be absolutely sure I don't confuse anyone again...

I'm NOT looking to compare the contents of files.

I'm only looking to see if two file paths are the same.

eg. c:\docume~1\file.txt IS c:\documents and settings\file.txt. Only difference is one is the short file name and the other is the long file name.

Edited by xinok
  • 0

I dont really get why you would like to do that since they are always the same. That means instead of writing

cd "c:\documents and settings"

you can always use

cd c:\docume~1\

But if you really want to do that then it's only matter of stripping strings.

Reason that it stands docume~1 is that DOS cannot handle filenames larger than 8 characters, so "documents and settings" became "docume~1".

My recomendation is that you have some kind of translation table that translates docume~1 to its full name or use full path as input and strip it down to 6 characters and add "~1" to it.

  • 0
  xinok said:
but I'm not sure if Windows has other ways of linking files to eachother.

586278291[/snapback]

unfortunately windows does not have symlinks like in unix - this is how i understand link of similar (the same) files.

maybe i did not understand your post correctly, but if you want it not as part of some school assignment, what's the problem using 'comp' command?

i have it in the sendTo menu..

  • 0

NTFS supports both hard and soft links like ext2; these features are largely underused though.

There is an api function called GetShortPathName that will deal with 8.3 vs long format. The best way to detect all links/short/long would be to check which directory entry the directories ultimately point to. I think this approach would work only on a per-file-system basis (e.g., you need diff code for FAT32 and NTFS).

  • 0

Jayzee: I know what short file names are.

Andareed: Thanks for the GetShortPathName function

robotnic: I found this: CreateHardLink. It doesn't seem to exist in VC++ 6 though. I was hoping to create a hard link file, see what I can find out about them.

Taken from here:

A hard link to a file is indistinguishable from the original name for the file; there's no particular link that is more the "real name" for the file than any other.

I guess there isn't much I can do about hard link files.

Off Topic

Something thats just sort of bugging me, whats with all the typedef's and #define's in the C++ headers?

typedef LPCSTR LPCTSTR; typedef CONST CHAR *LPCSTR, *PCSTR; #define CONST const, typedef char CHAR; etc.

I really don't see the point. It just makes C++ harder to learn trying to memorize all these "types", and overall makes code harder to read if you don't know what a certain typedef or define is.

  • 0

I've actually written 2 apps for creating soft and hard links. If anyone wants, I can post them.

If you use CreateHardLink, you'll probably need to install the platform sdk and change the vc++ includes/libs directories. Interestingly, there is a new CreateSymbolicLink on msdn that only works with longhorn.

@xinok: these namings are generally acronyms. LPCTSTR means Long Pointer Const null-Terminating STRing. Another one is TCHAR, which is CHAR on ANSI and WCHAR on UNICODE. CHAR is char because win32 uses all caps for structures and primitives.

  • 0
  Andareed said:
I've actually written 2 apps for creating soft and hard links. If anyone wants, I can post them.

586280364[/snapback]

I'd appreciate it if I can get those apps :yes: And thanks ahead of time.
  • 0

Alright, I was able to solve this little riddle. First, creating a hard link file can be done from a command in windows:

fsutil hardlink create c:\output.txt c:\input.txt

Now testing for duplicates (I already tried this, it also locks hard link files):

We have the inputfile and outputfile

First, open the inputfile, but deny access to all other processes (lock the file)

Now try opening the outputfile. If it succeeds, the files are different. If it fails, continue...

Unlock the inputfile, and try opening the outputfile again. If it succeeds this time, then the files are the same. If it fails again, something else is wrong so return an error. :)

  • 0

Couldn't find softlink app but I based it on the code from here: http://www.codeproject.com/w2k/junctionpoints.asp

The hardlink app just uses CreateHardLink. There is also a sysinternals tool with source called junction: http://www.sysinternals.com/Utilities/Junction.html

  • 0
  df_dukkar said:
could u tell me how to lock down a file ??

586282181[/snapback]

I used the OpenFile function.

#include <windows.h>

char* file = "file.dat";

OFSTRUCT fileinfo;

long handle = OpenFile(file, &fileinfo, OF_SHARE_EXCLUSIVE);

  • 0
  Quote
PIDL's are only used in the shell.

That doesn't mean that you can still use them, if theu turn out to be usefull for your purpose.

You're right about hard and soft links tho. Maybe it's possible to use one of those Nt* API's to determine the file-block they 'link' too.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Can you point out where his walkthrough of Mozilla's finances are lies?
    • Advanced Renamer 4.12 by Razvan Serea Advanced Renamer is a program for renaming multiple files and folders at once. By configuring renaming methods the names can be manipulated in various ways. It is easy to set up a batch job using multiple methods on a large amount of files. The 14 different methods enables you to change the names, attributes, and timestamps of files in one go. Free for personal use. You can download and use Advanced Renamer for FREE for any personal use. If you use Advanced Renamer for a business you can download and try it out for free. To continue using it, you need to buy a life time license. Image files This mass file renamer is a great utility for organising digital pictures for both professionals and beginners. The thumbnail mode lets you display thumbnails directly in the file list giving you maximum control of the renaming process. With this program you can rename all your photos in a snap. GPS data If your image files contain GPS data you can add the name of the city and the country where the picture was taken. Coordinates are used to lookup city, country, and state names from a database containing more than 100,000 cities around the globe. Music files MP3 and other music files often have messed up names and contain weird characters. With Advanced Renamer you can change the names of your favourite music files to more suitable names using the built-in ID3 functions. Video files Ever wanted to add the codec or the resolution of a video to the filename? With the video tags you can add various information about video and audio content to the names. TV shows Add episode title or airdate to video files containing TV Shows after importing show information from the tvmaze.com website. Advanced Renamer 4.12 changelog: Upgraded regular expression engine for use in renaming methods Replace method: Named group substitution is now supported in regular expressions (e.g., (?.*) and ${name}) Program is now less likely to crash when config file is corrupted Fixed an edge case bug in List Replace method Fixed large file support in ExifTool integration Improved reading XMP metadata from MP4 files ExifTool field names sometimes showed up in lists where they were not supported Will no longer show error "Extension changed" when new name is blank Disc and DiscCount metadata now correctly recognized for MP3 files Item details would sometimes show the same fields multiple times Additional metadata fields is now supported for MP4 files: AudioFormat, AudioChannels, AudioSampleType, AudioSampleRate, CompressionID, CompressionName, BitDepth, XResolution, YResolution More robust handling of MP4 files with corrupted data Added support for extracting metadata from some older QuickTime .mov files Fixed an issue reading GPS metadata from image and video files, when formattet in a certain way Improved MKV file metadata support Added support for metadata fields AudioFormat, AudioChannels, and AudioSampleRate for AVI files Import from CSV did not remember the last used column index for original filename Fixed name collision rule "Append image sub second" When using name collision rule "Append image sub second", the rule will now be applied to all items in the list with the same name Improved performance for JPEG files containing long XMP Extended metadata MacOS: Item preview panel will now use embedded thumbnails for JPEGs for better performance Download: Advanced Renamer 4.12 | Portable ~12.0 MB (Free for personal use) Link: Advanced Renamer Home Page | Advanced Renamer Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • I had that problem, too. I sold my motherboard on eBay. After a user bought it, he was complaining that the RAM pins were bent. I'm like, how tf did that happen. He hasn't replied in 2 weeks or ever, eBay gave me back my money.
    • it looks the same only smaller icons and more clutter
    • PDF Shaper 15.2 by Razvan Serea PDF Shaper is a set of feature-rich PDF software that makes it simple to split, merge, watermark, sign, optimize, convert, encrypt and decrypt your PDF documents, also delete and move pages, extract text and images. The program is optimized for low CPU resource usage and operates in batch mode, allowing users to process multiple PDF files while doing other work on their computers. PDF Shaper is available in three editions - Free, Premium and Ultimate. Compare and pick edition which is suitable for you. Compatible with Windows 7, 8, 10, 11. Features: Extract text, images of other objects from single or multiple PDFs Convert PDF to Word RTF or image, or convert image to PDF Extract pages from PDF and save them as separate PDF files Combine several PDF files into single PDF Encrypt and decrypt PDF with password, set user permissions Rotate, crop and normalize pages, set meta data Add watermark or remove images from PDF Benefits: Easy-to-use, intuitive user interface Low CPU resource usage during any process, including conversion Free for personal use and for any non-commercial organization Supports Unicode characters Supports batch processing for any operation Small installation size PDF Shaper 15.2 changelog: Updated translations. Improved table positioning for better layout accuracy (DOC to PDF). Enhanced image resizing to more accurately reflect DPI settings (Image to PDF). Improved text positioning for consistent formatting (TXT to PDF). Optimized overall performance on 64-bit systems. Enhanced support for Unicode text documents. Fixed an issue where font settings were not applied correctly to text files Download: PDF Shaper 15.2 | 7.9 MB (Free for personal use only) Link: PDF Shaper Home Page | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
  • Recent Achievements

    • Reacting Well
      Alan- earned a badge
      Reacting Well
    • Week One Done
      IAMFLUXX earned a badge
      Week One Done
    • One Month Later
      Æhund earned a badge
      One Month Later
    • One Month Later
      CoolRaoul earned a badge
      One Month Later
    • First Post
      Kurotama earned a badge
      First Post
  • Popular Contributors

    1. 1
      +primortal
      493
    2. 2
      ATLien_0
      267
    3. 3
      +FloatingFatMan
      224
    4. 4
      +Edouard
      199
    5. 5
      snowy owl
      141
  • Tell a friend

    Love Neowin? Tell a friend!