• 0

[C++] Test if two files are the same?


Question

I'm not looking to test if two files are duplicates, I know you can do that by doing a byte by byte comparison, or hashing both files.

What I'm looking for is a way to test if both files are literally the same file. For example:

c:\documents and settings\bla.txt

c:\docume~1\bla.txt

When comparing the strings, those might be seen as two different files, when really they're the same file. I could convert both strings to short file names, but I'm not sure if Windows has other ways of linking files to eachother.

In brief, I need a foolproof way to test if two files are the same file or different.

Link to comment
https://www.neowin.net/forum/topic/349791-c-test-if-two-files-are-the-same/
Share on other sites

21 answers to this question

Recommended Posts

  • 0

Well i can just give you puesudo code:

Read both files and store the text inside, into two seperate strings

--> If string 1 == string 2

Do what you want to happen

You will have to check yourself, or let someone else find out how to read the files, since i haven't learned about reading files yet.

  • 0

So your question is how are you supposed to see if two files are identical without comparing contents of them!?

Btw, love your avatar - reminds me of when I was creating demos in DOS and I made effect that looked exactly as your avatar... heh..

  • 0

so let me see if i'm getting the question.

u have 1 file but have 2 different strings for the paths:

as in

Path 1 is C:\Docs\abc.txt

Path 2 is D:\Prog\Desktop\abc.txt

put the two paths in 2 string arrays. parse the arrays backwards till u reach the '\', from that point on again go forward till the end of the array and store this in 2 new arrays.

now compare these 2 arrays to chk if they are the same.

  • 0
  df_dukkar said:
so let me see if i'm getting the question.

u have 1 file but have 2 different strings for the paths:

as  in

Path 1 is C:\Docs\abc.txt

Path 2 is D:\Prog\Desktop\abc.txt

put the two paths in 2 string arrays. parse the arrays backwards till u reach the '\', from that point on again go forward till the end of the array and store this in 2 new arrays.

now compare these 2 arrays to chk if they are the same.

586278450[/snapback]

He wants to see if the files are the same, i.e. they contain the same stuff, but for some reason he doesn't want to open the file or uses hashes like an MD5sum.

  • 0

Sorry, let me reexplain :pinch:

c:\documents and settings\file.txt

c:\docume~1\file.txt

Technically, both are the same file, but doing a string comparison would say differently.

I have a function which has two parameters, inputfile and outputfile. If the outputfile is different than the inputfile, it will truncate the outputfile. But if the inputfile and outputfile are the same file, then it'll overwrite the current file.

I already mentioned that I could convert both strings to a short filename, but I'm unsure if there are other things to consider. For example, %systemroot%.

So I need a "foolproof" way of testing if the two strings both literally represent the same file. If converting to a short filename would be adequate, could someone give me a function that can do this? I was unable to find anything on Google or MSDN :blush:

And sorry for not being clearer in my first post :(

Btw, love your avatar - reminds me of when I was creating demos in DOS and I made effect that looked exactly as your avatar... heh..

My avatar is a XOR effect, (X ^ Y) ;)

Edit: Just to be absolutely sure I don't confuse anyone again...

I'm NOT looking to compare the contents of files.

I'm only looking to see if two file paths are the same.

eg. c:\docume~1\file.txt IS c:\documents and settings\file.txt. Only difference is one is the short file name and the other is the long file name.

Edited by xinok
  • 0

I dont really get why you would like to do that since they are always the same. That means instead of writing

cd "c:\documents and settings"

you can always use

cd c:\docume~1\

But if you really want to do that then it's only matter of stripping strings.

Reason that it stands docume~1 is that DOS cannot handle filenames larger than 8 characters, so "documents and settings" became "docume~1".

My recomendation is that you have some kind of translation table that translates docume~1 to its full name or use full path as input and strip it down to 6 characters and add "~1" to it.

  • 0
  xinok said:
but I'm not sure if Windows has other ways of linking files to eachother.

586278291[/snapback]

unfortunately windows does not have symlinks like in unix - this is how i understand link of similar (the same) files.

maybe i did not understand your post correctly, but if you want it not as part of some school assignment, what's the problem using 'comp' command?

i have it in the sendTo menu..

  • 0

NTFS supports both hard and soft links like ext2; these features are largely underused though.

There is an api function called GetShortPathName that will deal with 8.3 vs long format. The best way to detect all links/short/long would be to check which directory entry the directories ultimately point to. I think this approach would work only on a per-file-system basis (e.g., you need diff code for FAT32 and NTFS).

  • 0

Jayzee: I know what short file names are.

Andareed: Thanks for the GetShortPathName function

robotnic: I found this: CreateHardLink. It doesn't seem to exist in VC++ 6 though. I was hoping to create a hard link file, see what I can find out about them.

Taken from here:

A hard link to a file is indistinguishable from the original name for the file; there's no particular link that is more the "real name" for the file than any other.

I guess there isn't much I can do about hard link files.

Off Topic

Something thats just sort of bugging me, whats with all the typedef's and #define's in the C++ headers?

typedef LPCSTR LPCTSTR; typedef CONST CHAR *LPCSTR, *PCSTR; #define CONST const, typedef char CHAR; etc.

I really don't see the point. It just makes C++ harder to learn trying to memorize all these "types", and overall makes code harder to read if you don't know what a certain typedef or define is.

  • 0

I've actually written 2 apps for creating soft and hard links. If anyone wants, I can post them.

If you use CreateHardLink, you'll probably need to install the platform sdk and change the vc++ includes/libs directories. Interestingly, there is a new CreateSymbolicLink on msdn that only works with longhorn.

@xinok: these namings are generally acronyms. LPCTSTR means Long Pointer Const null-Terminating STRing. Another one is TCHAR, which is CHAR on ANSI and WCHAR on UNICODE. CHAR is char because win32 uses all caps for structures and primitives.

  • 0
  Andareed said:
I've actually written 2 apps for creating soft and hard links. If anyone wants, I can post them.

586280364[/snapback]

I'd appreciate it if I can get those apps :yes: And thanks ahead of time.
  • 0

Alright, I was able to solve this little riddle. First, creating a hard link file can be done from a command in windows:

fsutil hardlink create c:\output.txt c:\input.txt

Now testing for duplicates (I already tried this, it also locks hard link files):

We have the inputfile and outputfile

First, open the inputfile, but deny access to all other processes (lock the file)

Now try opening the outputfile. If it succeeds, the files are different. If it fails, continue...

Unlock the inputfile, and try opening the outputfile again. If it succeeds this time, then the files are the same. If it fails again, something else is wrong so return an error. :)

  • 0

Couldn't find softlink app but I based it on the code from here: http://www.codeproject.com/w2k/junctionpoints.asp

The hardlink app just uses CreateHardLink. There is also a sysinternals tool with source called junction: http://www.sysinternals.com/Utilities/Junction.html

  • 0
  df_dukkar said:
could u tell me how to lock down a file ??

586282181[/snapback]

I used the OpenFile function.

#include <windows.h>

char* file = "file.dat";

OFSTRUCT fileinfo;

long handle = OpenFile(file, &fileinfo, OF_SHARE_EXCLUSIVE);

  • 0
  Quote
PIDL's are only used in the shell.

That doesn't mean that you can still use them, if theu turn out to be usefull for your purpose.

You're right about hard and soft links tho. Maybe it's possible to use one of those Nt* API's to determine the file-block they 'link' too.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Generally, Earth never initiated that animals lay straight
    • Several UI improvements masquerading as a major update. I'm truly hating this trend.
    • OpenAI to use Google Cloud despite rivalry, diversifying beyond Microsoft by Paul Hill To help it meet its massive computing demands for training and deploying AI models, OpenAI is looking into a surprising partnership with Google Cloud to use its services. It was widely seen that OpenAI was Google’s biggest threat, but this deal puts an end to the idea that the pair are purely competing. The two companies haven’t made any public announcement about the deal but a source speaking to Reuters claimed that talks had been ongoing for a few months before a deal was finalized in May. Notably, such a deal would see OpenAI expand its compute sources beyond Microsoft Azure. Microsoft had arrangements in place with OpenAI since 2019 that gave it the exclusive right to build new computing infrastructure for the startup. This limitation was loosened earlier this year with the announcement of Project Stargate. OpenAI is now allowed to look elsewhere for compute if Microsoft is unable to meet the demand. A win for Google Cloud, a challenge for Google's AI strategy The deal will see Google Cloud supply computing capacity for OpenAI’s AI model training and inference. This is a big win for Google’s Cloud unit because OpenAI is a massive name in AI and it lends credence to Google’s cloud offering. It also justifies Google Cloud’s expansion of its Tensor Processing Units (TPUs) for external use. On the back of the news, Alphabet’s stock price rose 2.1%, while Microsoft’s sank 0.6%, showing investors think it’s a good move for Google too. While many end users don’t interact with Google Cloud the same way they do with something like Android or Chrome, Cloud is actually a huge part of Google’s business. In 2024, it comprised $43 billion (12%) of Alphabet’s total revenue. With OpenAI as a customer, this figure could rise even more given the massive amounts of compute OpenAI needs. By leveraging Google’s services, it will also give OpenAI access to the search giant’s Tensor Processing Units (TPUs). Unlike GPUs, these chips are specifically designed to handle the kinds of calculations that are most common in AI and machine learning, leading to greater efficiency. Google’s expansion of these chips to external customers has already helped it attract business from Anthropic and Safe Superintelligence. While Google will happily take OpenAI’s money, it needs to tread carefully giving compute power to a rival, which will only make OpenAI more of a threat to Google’s search business. Specifically, it’ll need to manage how resources are allocated between Google’s own AI projects and its cloud customers. Another issue is that Google has been struggling to keep up with the overall demand for cloud computing, even with its own TPUs, according to its Chief Financial Officer in April. By giving access to OpenAI, it means even more pressure. Hopefully, this will be short lived as companies compete to build out capacity to attract customers. OpenAI's push for compute independence Back in 2019 when Microsoft became OpenAI’s exclusive cloud partner in exchange for $1 billion, the AI landscape was much different. End users wouldn’t have access to ChatGPT for another 3 years and the rate of development of new models was less ferocious than it is today. As OpenAI’s compute needs evolve, its relationship with Microsoft has had to evolve too, including this deal with Google and the Stargate infrastructure program. Reuters said that OpenAI’s annualized run rate (the amount they’ll earn in one year at its current pace) had surged to $10 billion, which highlights its explosive growth and need for more resources than Microsoft alone can offer. To make itself more independent, OpenAI has also signed deals worth billions of dollars with CoreWeave, another cloud compute provider, and it is nearing the finalization of the design of its first in-house chip, which could reduce its dependency on external hardware providers altogether. Source: Reuters
    • I don't think that means what you think it means
    • The Google Home app gets video forwarding support and many more features by Aman Kumar Along with releasing the Android 16 update for supported Pixel devices, Google has also showcased a number of features coming to its Home app. First up is PiP, also known as picture-in-picture mode, which will be available for Nest Cams on any Google TV device you own. It’ll be similar to YouTube’s PiP, with which you must be familiar with. A small window will appear in a corner of the TV screen, allowing you to view your Nest Cams without interrupting your viewing experience. The feature is currently in public preview, and you can enroll to try it out before its public release. Another YouTube feature that Google is adding to its Home app is the ability to jump 10 seconds forward or backward in recorded videos. This feature ensures that you don't have to go through the entire footage to locate the moment you’re looking for. Google mentioned in its blog post that it is adding more controls to the Google Home web app. Currently, the web app offers limited functionality, such as setting automations and viewing cameras, but soon you’ll be able to manage more things through it, such as adjusting lighting, controlling temperature, and locking or unlocking the door. Google’s AI model, Gemini, is also getting more controls in the Home app. You can use natural language in the Gemini app to search for specific footage in the camera history. Furthermore, the fallback assistant experience that broadcast commands use is also being updated. You’ll now be able to use your voice to broadcast messages through the connected speakers in your home. The Google blog post also mentions that you are no longer required to use the standalone Nest app to receive smoke and other critical alerts. You can now view the Nest Protect smoke and carbon monoxide (CO) status directly in the Home app. You’ll also be able to run safety checkups and hush alarms through the Home app. In addition to all these features, Google is also making the automation creation process much quicker, will allow you to add more tiles to the Home app Favorites section, and will let you create different Favorites for any other device you use, such as your smartwatch. The Home app will now also support third-party Matter locks. Similar to the Nest x Yale lock, you’ll be able to control various settings of these third-party locks, like managing household access, creating guest profiles, and more.
  • Recent Achievements

    • Week One Done
      Falisha Manpower earned a badge
      Week One Done
    • One Month Later
      elsa777 earned a badge
      One Month Later
    • Week One Done
      elsa777 earned a badge
      Week One Done
    • First Post
      K Dorman earned a badge
      First Post
    • Reacting Well
      rshit earned a badge
      Reacting Well
  • Popular Contributors

    1. 1
      +primortal
      540
    2. 2
      ATLien_0
      272
    3. 3
      +FloatingFatMan
      207
    4. 4
      +Edouard
      200
    5. 5
      snowy owl
      138
  • Tell a friend

    Love Neowin? Tell a friend!