• 0

[C++] Test if two files are the same?


Question

I'm not looking to test if two files are duplicates, I know you can do that by doing a byte by byte comparison, or hashing both files.

What I'm looking for is a way to test if both files are literally the same file. For example:

c:\documents and settings\bla.txt

c:\docume~1\bla.txt

When comparing the strings, those might be seen as two different files, when really they're the same file. I could convert both strings to short file names, but I'm not sure if Windows has other ways of linking files to eachother.

In brief, I need a foolproof way to test if two files are the same file or different.

Link to comment
https://www.neowin.net/forum/topic/349791-c-test-if-two-files-are-the-same/
Share on other sites

21 answers to this question

Recommended Posts

  • 0

Well i can just give you puesudo code:

Read both files and store the text inside, into two seperate strings

--> If string 1 == string 2

Do what you want to happen

You will have to check yourself, or let someone else find out how to read the files, since i haven't learned about reading files yet.

  • 0

So your question is how are you supposed to see if two files are identical without comparing contents of them!?

Btw, love your avatar - reminds me of when I was creating demos in DOS and I made effect that looked exactly as your avatar... heh..

  • 0

so let me see if i'm getting the question.

u have 1 file but have 2 different strings for the paths:

as in

Path 1 is C:\Docs\abc.txt

Path 2 is D:\Prog\Desktop\abc.txt

put the two paths in 2 string arrays. parse the arrays backwards till u reach the '\', from that point on again go forward till the end of the array and store this in 2 new arrays.

now compare these 2 arrays to chk if they are the same.

  • 0
  df_dukkar said:
so let me see if i'm getting the question.

u have 1 file but have 2 different strings for the paths:

as  in

Path 1 is C:\Docs\abc.txt

Path 2 is D:\Prog\Desktop\abc.txt

put the two paths in 2 string arrays. parse the arrays backwards till u reach the '\', from that point on again go forward till the end of the array and store this in 2 new arrays.

now compare these 2 arrays to chk if they are the same.

586278450[/snapback]

He wants to see if the files are the same, i.e. they contain the same stuff, but for some reason he doesn't want to open the file or uses hashes like an MD5sum.

  • 0

Sorry, let me reexplain :pinch:

c:\documents and settings\file.txt

c:\docume~1\file.txt

Technically, both are the same file, but doing a string comparison would say differently.

I have a function which has two parameters, inputfile and outputfile. If the outputfile is different than the inputfile, it will truncate the outputfile. But if the inputfile and outputfile are the same file, then it'll overwrite the current file.

I already mentioned that I could convert both strings to a short filename, but I'm unsure if there are other things to consider. For example, %systemroot%.

So I need a "foolproof" way of testing if the two strings both literally represent the same file. If converting to a short filename would be adequate, could someone give me a function that can do this? I was unable to find anything on Google or MSDN :blush:

And sorry for not being clearer in my first post :(

Btw, love your avatar - reminds me of when I was creating demos in DOS and I made effect that looked exactly as your avatar... heh..

My avatar is a XOR effect, (X ^ Y) ;)

Edit: Just to be absolutely sure I don't confuse anyone again...

I'm NOT looking to compare the contents of files.

I'm only looking to see if two file paths are the same.

eg. c:\docume~1\file.txt IS c:\documents and settings\file.txt. Only difference is one is the short file name and the other is the long file name.

Edited by xinok
  • 0

I dont really get why you would like to do that since they are always the same. That means instead of writing

cd "c:\documents and settings"

you can always use

cd c:\docume~1\

But if you really want to do that then it's only matter of stripping strings.

Reason that it stands docume~1 is that DOS cannot handle filenames larger than 8 characters, so "documents and settings" became "docume~1".

My recomendation is that you have some kind of translation table that translates docume~1 to its full name or use full path as input and strip it down to 6 characters and add "~1" to it.

  • 0
  xinok said:
but I'm not sure if Windows has other ways of linking files to eachother.

586278291[/snapback]

unfortunately windows does not have symlinks like in unix - this is how i understand link of similar (the same) files.

maybe i did not understand your post correctly, but if you want it not as part of some school assignment, what's the problem using 'comp' command?

i have it in the sendTo menu..

  • 0

NTFS supports both hard and soft links like ext2; these features are largely underused though.

There is an api function called GetShortPathName that will deal with 8.3 vs long format. The best way to detect all links/short/long would be to check which directory entry the directories ultimately point to. I think this approach would work only on a per-file-system basis (e.g., you need diff code for FAT32 and NTFS).

  • 0

Jayzee: I know what short file names are.

Andareed: Thanks for the GetShortPathName function

robotnic: I found this: CreateHardLink. It doesn't seem to exist in VC++ 6 though. I was hoping to create a hard link file, see what I can find out about them.

Taken from here:

A hard link to a file is indistinguishable from the original name for the file; there's no particular link that is more the "real name" for the file than any other.

I guess there isn't much I can do about hard link files.

Off Topic

Something thats just sort of bugging me, whats with all the typedef's and #define's in the C++ headers?

typedef LPCSTR LPCTSTR; typedef CONST CHAR *LPCSTR, *PCSTR; #define CONST const, typedef char CHAR; etc.

I really don't see the point. It just makes C++ harder to learn trying to memorize all these "types", and overall makes code harder to read if you don't know what a certain typedef or define is.

  • 0

I've actually written 2 apps for creating soft and hard links. If anyone wants, I can post them.

If you use CreateHardLink, you'll probably need to install the platform sdk and change the vc++ includes/libs directories. Interestingly, there is a new CreateSymbolicLink on msdn that only works with longhorn.

@xinok: these namings are generally acronyms. LPCTSTR means Long Pointer Const null-Terminating STRing. Another one is TCHAR, which is CHAR on ANSI and WCHAR on UNICODE. CHAR is char because win32 uses all caps for structures and primitives.

  • 0
  Andareed said:
I've actually written 2 apps for creating soft and hard links. If anyone wants, I can post them.

586280364[/snapback]

I'd appreciate it if I can get those apps :yes: And thanks ahead of time.
  • 0

Alright, I was able to solve this little riddle. First, creating a hard link file can be done from a command in windows:

fsutil hardlink create c:\output.txt c:\input.txt

Now testing for duplicates (I already tried this, it also locks hard link files):

We have the inputfile and outputfile

First, open the inputfile, but deny access to all other processes (lock the file)

Now try opening the outputfile. If it succeeds, the files are different. If it fails, continue...

Unlock the inputfile, and try opening the outputfile again. If it succeeds this time, then the files are the same. If it fails again, something else is wrong so return an error. :)

  • 0

Couldn't find softlink app but I based it on the code from here: http://www.codeproject.com/w2k/junctionpoints.asp

The hardlink app just uses CreateHardLink. There is also a sysinternals tool with source called junction: http://www.sysinternals.com/Utilities/Junction.html

  • 0
  df_dukkar said:
could u tell me how to lock down a file ??

586282181[/snapback]

I used the OpenFile function.

#include <windows.h>

char* file = "file.dat";

OFSTRUCT fileinfo;

long handle = OpenFile(file, &fileinfo, OF_SHARE_EXCLUSIVE);

  • 0
  Quote
PIDL's are only used in the shell.

That doesn't mean that you can still use them, if theu turn out to be usefull for your purpose.

You're right about hard and soft links tho. Maybe it's possible to use one of those Nt* API's to determine the file-block they 'link' too.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • There is but look up training videos to do data analysis in Excel v the app they use in libre office? Don't even know the name. Besides we are talking about govt employees. Have you tried explaining command line to a govt employee? Look I respect the people but I know that they could not be bothered with open source
    • Let's see how long this lasts. In the end, it comes down to productivity lost because of workflow disruptions. It's not even a question of "which is better", rather how painful will it be to switch and it's hard enough for a single person to switch - imagine an entire city's bureaucracy. Remember, there are governmental system in the US that are still using 5.25" floppy disks... Having been involved in these kinds of swaps, I can tell you - it's never as easy as the fanbase thinks it is.
    • Right, saw it in the microsoft blog, wasn't mentioned in the article, thanks.
    • Multiple internal and external HDDs from Seagate, Western Digital are now at great prices by Fiza Ali Amazon and Newegg are currently offering substantial discounts on a wide selection of internal and external hard drives from Seagate and Western Digital, with prices reduced across multiple capacities. The 4TB WD Purple Surveillance is a 3.5-inch SATA III drive offering sustained transfer rates of up to 175MB/s. It employs Conventional Magnetic Recording (CMR) with a 256MB cache buffer. The drive operates reliably between 0°C and 65°C and can be stored in temperatures ranging from –40°C to 70°C. Western Digital backs this unit with a three-year limited warranty as well. 4TB WD Purple Surveillance Internal HDD: $84.41 (Amazon US) - 8% off The 6TB WD Blue is also a 3.5-inch internal hard drive that connects via SATA III (6Gb/s) and delivers sustained transfer rates of up to 185MB/s. It spins at 5,400 RPM, employs Conventional Magnetic Recording (CMR) technology, and features a 256MB cache buffer. The drive operates reliably in temperatures from 0°C to 60°C (with safe storage down to –40°C and up to 70°C). It is backed by a two-year limited manufacturer’s warranty. 6TB WD Blue PC Internal HDD: $99.99 (Amazon US) - 17% off The 10TB WD Red Pro NAS drive comes in a 3.5-inch form factor and connects via SATA III (6Gb/s). It sustains transfer speeds of up to 267MB/s thanks to its 7,200 RPM spindle and 512MB cache buffer, and employs Conventional Magnetic Recording (CMR) for reliable multi-drive operation. It operates safely between 0°C and 65°C, can be stored or transported in temperatures from –40°C to 70°C, and is covered by Western Digital’s five-year limited warranty. 10TB WD Red Pro NAS Internal HDD: $237.49 (Amazon US) - 15% off This WD Elements Desktop external hard drive offers a 14TB of storage via a USB 3.0 interface (up to 5Gb/s), using a USB Micro-B connector that is backward-compatible with USB 2.0. It operates reliably between 5°C and 35°C and can be stored in temperatures ranging from –20°C to 65°C. The drive is powered by an external adapter and carries a two-year limited warranty. 14TB WD Elements Desktop External HDD: $199.99 (Amazon US) - 31% off The 16TB Seagate Expansion Desktop external hard drive delivers vast storage capacity in a simple, plug-and-play design. USB 3.0 connectivity provides high-speed data transfer rates. Out of the box, the Expansion Desktop model is recognised automatically by Windows, macOS, and ChromeOS systems. If you wish to use Apple’s Time Machine backup utility, the drive must be reformatted to the HFS+ file system. 16TB Seagate Expansion Desktop External HDD: $229.99 (Newegg) - 30% off The 16TB WD Elements desktop external HDD connects via a USB 3.0 interface using a Micro-B cable (up to 5Gb/s.) The drive features plug-and-play functionality, working straight out of the box with Windows PCs. It operates reliably in ambient temperatures from 5°C to 35°C and can be stored in temperatures ranging from –20°C to 65°C. The drive comes with a 2-year limited warranty as well. 16TB WD Elements Desktop External HDD: $249.99 + $20 off promo code SAAET2384 = 229.99 (Newegg) The 16TB Seagate BarraCuda 3.5-inch internal HDD offers Multi-Tier Caching Technology (MTC) which balances NAND flash, DRAM, and media cache layers to accelerate application launches, reduce load times, and maintain consistently high sustained read/write speeds. The included Seagate DiscWizard software simplifies drive migration, cloning, partitioning, and backup tasks. The drive is covered by a two-year limited warranty. 16TB Seagate BarraCuda Internal HDD: $194.99 (Newegg) - 7% off The 20TB Seagate Exos X20 delivers an enterprise-class solution for high-density storage environments and data centres. It offers a sustained sequential transfer rate of up to 285MB/s and advanced caching to ensure low-latency, repeatable response times for data-intensive workloads. It further features 550TB/year workload rating, 2.5 million-hour mean time between failures (MTBF), and five-year limited warranty. PowerChoice and PowerBalance technologies allow administrators to tailor power consumption profiles for active and idle states, reducing energy costs and cooling requirements. Hardware-based AES-256 encryption, password protection, and Seagate Secure certification safeguard sensitive data. 20TB Seagate Exos X20 Internal HDD: $379 + $50 off promo code EPET2523 = $329.99 (Newegg) This Amazon deal is US-specific and not available in other regions unless specified. If you don't like it or want to look at more options, check out the Amazon US deals page here. Get Prime (SNAP), Prime Video, Audible Plus or Kindle / Music Unlimited. Free for 30 days. As an Amazon Associate, we earn from qualifying purchases.
    • It's all 1Password's fault for using it before anyone else. 🙃
  • Recent Achievements

    • Collaborator
      Mighty Pen went up a rank
      Collaborator
    • Week One Done
      emptyother earned a badge
      Week One Done
    • Week One Done
      DarkWun earned a badge
      Week One Done
    • Very Popular
      valkyr09 earned a badge
      Very Popular
    • Week One Done
      suprememobiles earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      569
    2. 2
      +FloatingFatMan
      180
    3. 3
      ATLien_0
      175
    4. 4
      Xenon
      116
    5. 5
      Som
      110
  • Tell a friend

    Love Neowin? Tell a friend!