• 0

[C++] Test if two files are the same?


Question

I'm not looking to test if two files are duplicates, I know you can do that by doing a byte by byte comparison, or hashing both files.

What I'm looking for is a way to test if both files are literally the same file. For example:

c:\documents and settings\bla.txt

c:\docume~1\bla.txt

When comparing the strings, those might be seen as two different files, when really they're the same file. I could convert both strings to short file names, but I'm not sure if Windows has other ways of linking files to eachother.

In brief, I need a foolproof way to test if two files are the same file or different.

Link to comment
https://www.neowin.net/forum/topic/349791-c-test-if-two-files-are-the-same/
Share on other sites

21 answers to this question

Recommended Posts

  • 0

Well i can just give you puesudo code:

Read both files and store the text inside, into two seperate strings

--> If string 1 == string 2

Do what you want to happen

You will have to check yourself, or let someone else find out how to read the files, since i haven't learned about reading files yet.

  • 0

So your question is how are you supposed to see if two files are identical without comparing contents of them!?

Btw, love your avatar - reminds me of when I was creating demos in DOS and I made effect that looked exactly as your avatar... heh..

  • 0

so let me see if i'm getting the question.

u have 1 file but have 2 different strings for the paths:

as in

Path 1 is C:\Docs\abc.txt

Path 2 is D:\Prog\Desktop\abc.txt

put the two paths in 2 string arrays. parse the arrays backwards till u reach the '\', from that point on again go forward till the end of the array and store this in 2 new arrays.

now compare these 2 arrays to chk if they are the same.

  • 0
  df_dukkar said:
so let me see if i'm getting the question.

u have 1 file but have 2 different strings for the paths:

as  in

Path 1 is C:\Docs\abc.txt

Path 2 is D:\Prog\Desktop\abc.txt

put the two paths in 2 string arrays. parse the arrays backwards till u reach the '\', from that point on again go forward till the end of the array and store this in 2 new arrays.

now compare these 2 arrays to chk if they are the same.

586278450[/snapback]

He wants to see if the files are the same, i.e. they contain the same stuff, but for some reason he doesn't want to open the file or uses hashes like an MD5sum.

  • 0

Sorry, let me reexplain :pinch:

c:\documents and settings\file.txt

c:\docume~1\file.txt

Technically, both are the same file, but doing a string comparison would say differently.

I have a function which has two parameters, inputfile and outputfile. If the outputfile is different than the inputfile, it will truncate the outputfile. But if the inputfile and outputfile are the same file, then it'll overwrite the current file.

I already mentioned that I could convert both strings to a short filename, but I'm unsure if there are other things to consider. For example, %systemroot%.

So I need a "foolproof" way of testing if the two strings both literally represent the same file. If converting to a short filename would be adequate, could someone give me a function that can do this? I was unable to find anything on Google or MSDN :blush:

And sorry for not being clearer in my first post :(

Btw, love your avatar - reminds me of when I was creating demos in DOS and I made effect that looked exactly as your avatar... heh..

My avatar is a XOR effect, (X ^ Y) ;)

Edit: Just to be absolutely sure I don't confuse anyone again...

I'm NOT looking to compare the contents of files.

I'm only looking to see if two file paths are the same.

eg. c:\docume~1\file.txt IS c:\documents and settings\file.txt. Only difference is one is the short file name and the other is the long file name.

Edited by xinok
  • 0

I dont really get why you would like to do that since they are always the same. That means instead of writing

cd "c:\documents and settings"

you can always use

cd c:\docume~1\

But if you really want to do that then it's only matter of stripping strings.

Reason that it stands docume~1 is that DOS cannot handle filenames larger than 8 characters, so "documents and settings" became "docume~1".

My recomendation is that you have some kind of translation table that translates docume~1 to its full name or use full path as input and strip it down to 6 characters and add "~1" to it.

  • 0
  xinok said:
but I'm not sure if Windows has other ways of linking files to eachother.

586278291[/snapback]

unfortunately windows does not have symlinks like in unix - this is how i understand link of similar (the same) files.

maybe i did not understand your post correctly, but if you want it not as part of some school assignment, what's the problem using 'comp' command?

i have it in the sendTo menu..

  • 0

NTFS supports both hard and soft links like ext2; these features are largely underused though.

There is an api function called GetShortPathName that will deal with 8.3 vs long format. The best way to detect all links/short/long would be to check which directory entry the directories ultimately point to. I think this approach would work only on a per-file-system basis (e.g., you need diff code for FAT32 and NTFS).

  • 0

Jayzee: I know what short file names are.

Andareed: Thanks for the GetShortPathName function

robotnic: I found this: CreateHardLink. It doesn't seem to exist in VC++ 6 though. I was hoping to create a hard link file, see what I can find out about them.

Taken from here:

A hard link to a file is indistinguishable from the original name for the file; there's no particular link that is more the "real name" for the file than any other.

I guess there isn't much I can do about hard link files.

Off Topic

Something thats just sort of bugging me, whats with all the typedef's and #define's in the C++ headers?

typedef LPCSTR LPCTSTR; typedef CONST CHAR *LPCSTR, *PCSTR; #define CONST const, typedef char CHAR; etc.

I really don't see the point. It just makes C++ harder to learn trying to memorize all these "types", and overall makes code harder to read if you don't know what a certain typedef or define is.

  • 0

I've actually written 2 apps for creating soft and hard links. If anyone wants, I can post them.

If you use CreateHardLink, you'll probably need to install the platform sdk and change the vc++ includes/libs directories. Interestingly, there is a new CreateSymbolicLink on msdn that only works with longhorn.

@xinok: these namings are generally acronyms. LPCTSTR means Long Pointer Const null-Terminating STRing. Another one is TCHAR, which is CHAR on ANSI and WCHAR on UNICODE. CHAR is char because win32 uses all caps for structures and primitives.

  • 0
  Andareed said:
I've actually written 2 apps for creating soft and hard links. If anyone wants, I can post them.

586280364[/snapback]

I'd appreciate it if I can get those apps :yes: And thanks ahead of time.
  • 0

Alright, I was able to solve this little riddle. First, creating a hard link file can be done from a command in windows:

fsutil hardlink create c:\output.txt c:\input.txt

Now testing for duplicates (I already tried this, it also locks hard link files):

We have the inputfile and outputfile

First, open the inputfile, but deny access to all other processes (lock the file)

Now try opening the outputfile. If it succeeds, the files are different. If it fails, continue...

Unlock the inputfile, and try opening the outputfile again. If it succeeds this time, then the files are the same. If it fails again, something else is wrong so return an error. :)

  • 0

Couldn't find softlink app but I based it on the code from here: http://www.codeproject.com/w2k/junctionpoints.asp

The hardlink app just uses CreateHardLink. There is also a sysinternals tool with source called junction: http://www.sysinternals.com/Utilities/Junction.html

  • 0
  df_dukkar said:
could u tell me how to lock down a file ??

586282181[/snapback]

I used the OpenFile function.

#include <windows.h>

char* file = "file.dat";

OFSTRUCT fileinfo;

long handle = OpenFile(file, &fileinfo, OF_SHARE_EXCLUSIVE);

  • 0
  Quote
PIDL's are only used in the shell.

That doesn't mean that you can still use them, if theu turn out to be usefull for your purpose.

You're right about hard and soft links tho. Maybe it's possible to use one of those Nt* API's to determine the file-block they 'link' too.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Download The Chief AI Officer's Handbook (worth $9.99) for free until July 2 by Steven Parker Chief Artificial Intelligence Officers (CAIOs) are now imperative for businesses, enabling organizations to achieve strategic goals and unlock transformative opportunities through the power of AI. Claim your complimentary copy worth $9.99 for free, before the offer ends on July 2. By building intelligent systems, training models to drive impactful decisions, and creating innovative applications, they empower organizations to thrive in an AI-driven world. Written by Jarrod Anderson, Chief AI Officer at SYRV.AI, this book bridges the gap between visionary leadership and practical execution. This handbook reimagines AI leadership for today’s fast-paced environment, leveraging predictive, deterministic, generative, and agentic AI to address complex challenges and foster innovation. It provides CAIOs with the strategies to develop transformative AI initiatives, build and lead elite teams, and adopt AI responsibly while maintaining compliance. From shaping impactful solutions to achieving measurable business outcomes, this guide offers a roadmap for making AI your organization’s competitive edge. By the end of this book, you’ll have the knowledge and tools to excel as a Chief AI Officer, driving innovation, strategic growth, and lasting success for your organization. How to get it Please ensure you read the terms and conditions to claim this offer. Complete and verifiable information is required in order to receive this free offer. If you have previously made use of these free offers, you will not need to re-register. While supplies last! Download The Chief AI Officer's Handbook (worth $9.99) for free Offered by Wiley, view other free resources The below offers are also available for free in exchange for your (work) email: How to Engage Buyers and Drive Growth in the Age of AI ($22.95 Value) FREE – Expires 7/1 Using Artificial Intelligence to Save the World ($30 Value) FREE – Expires 7/1 Essential: How Distributed Teams, Generative AI, [...] ($18 Value) FREE – Expires 7/2 The Chief AI Officer's Handbook: Master AI leadership with strategies to innovate, overcome challenges, and drive business growth ($9.99 Value) FREE for a Limited Time – Expires 7/2 How I Rob Banks: And Other Such Places ($25 Value) FREE – Expires 7/8 Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning ($42 Value) FREE – Expires 7/8 Securing Microsoft Azure OpenAI ($44 Value) FREE – Expires 7/9 Data Quality in the Age of AI: Building a foundation for AI strategy and data culture ($9.99 Value) FREE – Expires 7/9 The Ultimate Linux Newbie Guide – Featured Free content Python Notes for Professionals – Featured Free content Learn Linux in 5 Days – Featured Free content Quick Reference Guide for Cybersecurity – Featured Free content We post these because we earn commission on each lead so as not to rely solely on advertising, which many of our readers block. It all helps toward paying staff reporters, servers and hosting costs. Other ways to support Neowin The above deal not doing it for you, but still want to help? Check out the links below. Check out our partner software in the Neowin Store Buy a T-shirt at Neowin's Threadsquad Subscribe to Neowin - for $14 a year, or $28 a year for an ad-free experience Disclosure: An account at Neowin Deals is required to participate in any deals powered by our affiliate, StackCommerce. For a full description of StackCommerce's privacy guidelines, go here. Neowin benefits from shared revenue of each sale made through the branded deals site.
    • W11 certainly does not keep me awake. Lol.
    • Despite how they got there (cough Activision) isn't this the most profitable the Xbox/gaming division has ever been? With the wall between PC and consoles breaking down every day, it would be even more odd if the #1 PC operating system for gaming didn't also pivot.
    • Cisco Secure Endpoint updates help security teams see and fix misconfigurations by Paul Hill Cisco has announced new enhancements for its Secure Endpoint solution for businesses which focus on two big cybersecurity challenges: misconfigurations and advanced threat detection. On the first point about misconfigurations, Cisco’s threat intelligence group Talos said that 25% of incidents are down to Endpoint Detection and Response (EDR) misconfigurations so it’s releasing the Secure Endpoint Configuration Insights tool to let organizations visualize MITRE ATT&CK coverage, and then identify and resolve misconfiguration risks. With the MITRE ATT&CK coverage map, administrators are shown which attack methods their current security setup can defend against. It helps to show where defenses are strong and where they may have gaps based on how their Secure Endpoint is configured. With MITRE ATT&CK, admins are able to learn more about adversary tactics and techniques that occur in the real-world. The insights tool also helps by giving admins protection status monitoring which shows you endpoints (such as personal computers and servers) with their security engines switched on, off, or in audit mode (watching for threats but not blocking). With this, admins can find any users creating a weak link in the chain to ensure they get protections turned back on. Finally, the tool doesn’t only just highlight flaws in your defenses, it also gives you targeted recommendations so that you can address any policies that aren’t optimized for Secure Endpoint’s MITRE-mapped protections. This lets admins secure their networks “faster than ever.” Aside from misconfigurations, Cisco is also improving its ability to prevent advanced threats by enhancing how Secure Endpoint's Exploit Prevention works with Cisco XDR. To help protect systems, Exploit Prevention uses advanced moving target defense (AMTD) techniques to hide operating systems and applications from attackers. AMTD means that the attack surface is constantly changing, making it harder for threat actors to land a successful attack. Cisco says this method can be a real boost to organizations now that we live in a world of AI-enabled attacks. The AMTD techniques Cisco is now using also makes it harder to exploit vulnerabilities through the stealthy Living off the Land techniques where attackers use legitimate tools and features present on systems to carry out their attacks. With AMTD, these tools become less predictable for the attacker. If you’re already using Cisco XDR with Secure Endpoint, these new protections are automatically enabled with no extra work needed. Image via Depositphotos.com
  • Recent Achievements

    • Week One Done
      TIGOSS earned a badge
      Week One Done
    • First Post
      henryj earned a badge
      First Post
    • First Post
      CarolynHelen earned a badge
      First Post
    • Reacting Well
      henryj earned a badge
      Reacting Well
    • Community Regular
      Primey_ went up a rank
      Community Regular
  • Popular Contributors

    1. 1
      +primortal
      483
    2. 2
      +FloatingFatMan
      190
    3. 3
      ATLien_0
      161
    4. 4
      Xenon
      82
    5. 5
      Som
      76
  • Tell a friend

    Love Neowin? Tell a friend!