• 0

[C++] Test if two files are the same?


Question

I'm not looking to test if two files are duplicates, I know you can do that by doing a byte by byte comparison, or hashing both files.

What I'm looking for is a way to test if both files are literally the same file. For example:

c:\documents and settings\bla.txt

c:\docume~1\bla.txt

When comparing the strings, those might be seen as two different files, when really they're the same file. I could convert both strings to short file names, but I'm not sure if Windows has other ways of linking files to eachother.

In brief, I need a foolproof way to test if two files are the same file or different.

Link to comment
https://www.neowin.net/forum/topic/349791-c-test-if-two-files-are-the-same/
Share on other sites

21 answers to this question

Recommended Posts

  • 0

Well i can just give you puesudo code:

Read both files and store the text inside, into two seperate strings

--> If string 1 == string 2

Do what you want to happen

You will have to check yourself, or let someone else find out how to read the files, since i haven't learned about reading files yet.

  • 0

So your question is how are you supposed to see if two files are identical without comparing contents of them!?

Btw, love your avatar - reminds me of when I was creating demos in DOS and I made effect that looked exactly as your avatar... heh..

  • 0

so let me see if i'm getting the question.

u have 1 file but have 2 different strings for the paths:

as in

Path 1 is C:\Docs\abc.txt

Path 2 is D:\Prog\Desktop\abc.txt

put the two paths in 2 string arrays. parse the arrays backwards till u reach the '\', from that point on again go forward till the end of the array and store this in 2 new arrays.

now compare these 2 arrays to chk if they are the same.

  • 0
  df_dukkar said:
so let me see if i'm getting the question.

u have 1 file but have 2 different strings for the paths:

as  in

Path 1 is C:\Docs\abc.txt

Path 2 is D:\Prog\Desktop\abc.txt

put the two paths in 2 string arrays. parse the arrays backwards till u reach the '\', from that point on again go forward till the end of the array and store this in 2 new arrays.

now compare these 2 arrays to chk if they are the same.

586278450[/snapback]

He wants to see if the files are the same, i.e. they contain the same stuff, but for some reason he doesn't want to open the file or uses hashes like an MD5sum.

  • 0

Sorry, let me reexplain :pinch:

c:\documents and settings\file.txt

c:\docume~1\file.txt

Technically, both are the same file, but doing a string comparison would say differently.

I have a function which has two parameters, inputfile and outputfile. If the outputfile is different than the inputfile, it will truncate the outputfile. But if the inputfile and outputfile are the same file, then it'll overwrite the current file.

I already mentioned that I could convert both strings to a short filename, but I'm unsure if there are other things to consider. For example, %systemroot%.

So I need a "foolproof" way of testing if the two strings both literally represent the same file. If converting to a short filename would be adequate, could someone give me a function that can do this? I was unable to find anything on Google or MSDN :blush:

And sorry for not being clearer in my first post :(

Btw, love your avatar - reminds me of when I was creating demos in DOS and I made effect that looked exactly as your avatar... heh..

My avatar is a XOR effect, (X ^ Y) ;)

Edit: Just to be absolutely sure I don't confuse anyone again...

I'm NOT looking to compare the contents of files.

I'm only looking to see if two file paths are the same.

eg. c:\docume~1\file.txt IS c:\documents and settings\file.txt. Only difference is one is the short file name and the other is the long file name.

Edited by xinok
  • 0

I dont really get why you would like to do that since they are always the same. That means instead of writing

cd "c:\documents and settings"

you can always use

cd c:\docume~1\

But if you really want to do that then it's only matter of stripping strings.

Reason that it stands docume~1 is that DOS cannot handle filenames larger than 8 characters, so "documents and settings" became "docume~1".

My recomendation is that you have some kind of translation table that translates docume~1 to its full name or use full path as input and strip it down to 6 characters and add "~1" to it.

  • 0
  xinok said:
but I'm not sure if Windows has other ways of linking files to eachother.

586278291[/snapback]

unfortunately windows does not have symlinks like in unix - this is how i understand link of similar (the same) files.

maybe i did not understand your post correctly, but if you want it not as part of some school assignment, what's the problem using 'comp' command?

i have it in the sendTo menu..

  • 0

NTFS supports both hard and soft links like ext2; these features are largely underused though.

There is an api function called GetShortPathName that will deal with 8.3 vs long format. The best way to detect all links/short/long would be to check which directory entry the directories ultimately point to. I think this approach would work only on a per-file-system basis (e.g., you need diff code for FAT32 and NTFS).

  • 0

Jayzee: I know what short file names are.

Andareed: Thanks for the GetShortPathName function

robotnic: I found this: CreateHardLink. It doesn't seem to exist in VC++ 6 though. I was hoping to create a hard link file, see what I can find out about them.

Taken from here:

A hard link to a file is indistinguishable from the original name for the file; there's no particular link that is more the "real name" for the file than any other.

I guess there isn't much I can do about hard link files.

Off Topic

Something thats just sort of bugging me, whats with all the typedef's and #define's in the C++ headers?

typedef LPCSTR LPCTSTR; typedef CONST CHAR *LPCSTR, *PCSTR; #define CONST const, typedef char CHAR; etc.

I really don't see the point. It just makes C++ harder to learn trying to memorize all these "types", and overall makes code harder to read if you don't know what a certain typedef or define is.

  • 0

I've actually written 2 apps for creating soft and hard links. If anyone wants, I can post them.

If you use CreateHardLink, you'll probably need to install the platform sdk and change the vc++ includes/libs directories. Interestingly, there is a new CreateSymbolicLink on msdn that only works with longhorn.

@xinok: these namings are generally acronyms. LPCTSTR means Long Pointer Const null-Terminating STRing. Another one is TCHAR, which is CHAR on ANSI and WCHAR on UNICODE. CHAR is char because win32 uses all caps for structures and primitives.

  • 0
  Andareed said:
I've actually written 2 apps for creating soft and hard links. If anyone wants, I can post them.

586280364[/snapback]

I'd appreciate it if I can get those apps :yes: And thanks ahead of time.
  • 0

Alright, I was able to solve this little riddle. First, creating a hard link file can be done from a command in windows:

fsutil hardlink create c:\output.txt c:\input.txt

Now testing for duplicates (I already tried this, it also locks hard link files):

We have the inputfile and outputfile

First, open the inputfile, but deny access to all other processes (lock the file)

Now try opening the outputfile. If it succeeds, the files are different. If it fails, continue...

Unlock the inputfile, and try opening the outputfile again. If it succeeds this time, then the files are the same. If it fails again, something else is wrong so return an error. :)

  • 0

Couldn't find softlink app but I based it on the code from here: http://www.codeproject.com/w2k/junctionpoints.asp

The hardlink app just uses CreateHardLink. There is also a sysinternals tool with source called junction: http://www.sysinternals.com/Utilities/Junction.html

  • 0
  df_dukkar said:
could u tell me how to lock down a file ??

586282181[/snapback]

I used the OpenFile function.

#include <windows.h>

char* file = "file.dat";

OFSTRUCT fileinfo;

long handle = OpenFile(file, &fileinfo, OF_SHARE_EXCLUSIVE);

  • 0
  Quote
PIDL's are only used in the shell.

That doesn't mean that you can still use them, if theu turn out to be usefull for your purpose.

You're right about hard and soft links tho. Maybe it's possible to use one of those Nt* API's to determine the file-block they 'link' too.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Windows 11 gets resizable taskbar icons, PC Migration tool, and more in KB5060829 by Taras Buria Windows 11 24H2 users, it is time for this month's non-security update. KB5060829 is here, and it has some long-requested taskbar updates, a new tool for transferring files between PCs, and a hefty list of other improvements. With KB5060829, build number 26100.4484, Microsoft is finally delivering the long-requested small taskbar icons, one of the missing Windows 10 taskbar features. However, with Windows 11, Microsoft gives users more flexibility: you can have your taskbar as is, make icons smaller as the taskbar gets full, or keep icons small all the time. Neat! Additionally, a new PC Migration app is available, which enables transferring files from one PC to another during the initial setup. However, in a somewhat confusing move, all users receive only the landing page and the pairing page. The actual capability is coming in a future update. Other changes include performance improvements in File Explorer for handling large archives, fixes for display flashing, new accessibility features, and more. Here is the complete changelog (gradual rollout): [App defaults] New! We are rolling out some small changes in the European Economic Area (EEA) region for default browsers through the Set default button in Settings > Apps > Default apps: Additional file and link types will be set for the new default browser, if it registers them. The new default browser will be pinned to the Taskbar and Start menu unless you choose not to pin it by clearing the checkboxes. There is now a separate one-click button for browsers to change your .pdf default, if the browser registers for the .pdf file type. [Click to Do (Preview)] New! Ask Microsoft 365 Copilot is a new action in Click to Do on Copilot+ PCs. Use it to send text or images to Microsoft 365 Copilot, which will respond to your questions. The Ask Microsoft 365 Copilot action requires a Microsoft 365 license and follows your organization’s privacy policies. [Narrator] New! The Screen Curtain feature in Narrator helps protect your privacy and improve focus by blacking out the screen while Narrator reads content aloud. This is especially helpful in public or shared spaces, where you can work with sensitive information without others seeing your screen. To turn on Narrator, press Ctrl + Windows + Enter. Then press Caps Lock + Ctrl + C to turn on Screen Curtain. While it’s on, you can use Narrator as usual with the screen hidden. Press Caps Lock + Ctrl + C again to turn it off. New! Narrator makes it easier to discover and learn about its features directly within the experience. Whether you're new or exploring advanced options, Narrator will guide you through the latest updates using a series of steps and prompts that explain each new feature and change. [PC Migration] New! The PC-to-PC migration experience in Windows is starting to roll out. You’ll begin to see the landing and the pairing page in the Windows Backup app, giving you a first look at what’s coming. In the full experience, you’ll be able to transfer files and settings from an old PC to a new one during setup. Support for this feature during PC setup will arrive in a future update. The rollout is being introduced in phases to support a smooth experience. [Settings] New!​​​​​​​ The Settings homepage managed by IT admins now includes cards tailored for enterprise use. These include familiar options like "Recommended settings" and "Bluetooth devices" along with other new cards for device info and accessibility preferences. If a user signs in with both a work or school account and Microsoft account, an additional account card appears to show both account types. New! The country or region selected during device setup now appears under Settings > Time & language > Language & region. Fixed: The storage card in Settings > System > About shows an incorrect or unreadable character instead of the proper disk size. [Taskbar & System Tray] New!​​​​​​​ ​​​​​​​​​​​​​​The taskbar now resizes icons to fit more apps when space runs low, keeping everything visible and easy to access. You can adjust how icons appear in settings—reduce icon size only when the taskbar is full (default), keep icons at their original size at all times by selecting Never, or use smaller icons all the time by selecting Always. To change this setting, right-click an empty area on the taskbar, select Taskbar settings, expand the Taskbar behaviors section, and choose your preference under Show smaller taskbar buttons. New! In addition to the new grouping of the Accessibility menu in Quick settings, there are text descriptions for the assistive technologies like Narrator, Voice access, and more for easier identification and learning. New! Adjusted the indicator (pill) under taskbar apps to make it wider and more visible.​​​​​​​ Fixed: WIN + CTRL + Number doesn't work anymore for switching windows of an open app in the taskbar.​​​​​​​ Fixed: When using taskbar in Windows, the media controls that appear in the preview windows for apps might unexpectedly flicker. [Voice Access] New! You can now use Voice access to navigate, dictate, and interact with Windows using voice commands in Simplified Chinese, Traditional Chinese, and Japanese. New! You can add custom words to the dictionary in voice access. The feature will be available in all the currently supported voice access languages. [Windows Share] ​​​​​​​New! When you share links or web content using the Windows share window, you'll see a visual preview for that content. New! In the Windows share window, you can select a compression level—High, Medium, or Low Quality—when editing and sharing images, instead of selecting from a 0–100 scale. [Color] Improved: Adjusted the location of the intensity and color boost sliders under Settings > Accessibility > Color Filters, so the color previews at the top of the page remain visible while adjusting the sliders. [File Explorer] Improved: Performance has been enhanced when extracting archive files - this will particularly help in the case of copy pasting large numbers of files out of large 7z or .rar archives. [Graphics] Improved: Made underlying changes to enhance display related user experiences, including reducing screen flashing during certain display configuration transitions and eliminating unnecessary display resets that occurred in some cases​​​​​​​. Fixed: Some displays might be unexpectedly green. Fixed: If User Account Control (UAC) is set to Always Notify and the button under Settings > System > Display color calibration is selected for your display and canceled, Settings will stop responding. [Input] Fixed: Typing in Japanese with the touch keyboard might stop working after switching to typing with an English keyboard and then switching back. [MSFTEdit.dll] Fixed: Some apps like Sticky Notes and dxdiag might stop responding when the display language is set to Arabic or Hebrew display. [Quick Settings] Fixed: The top three buttons in the top row don’t respond when selecting to enable or disable them. [Printing] Fixed: Printed lines might be unexpectedly thicker than expected. [Scripting] Fixed: Running a script on a remote Server Message Block (SMB) share might take an unexpectedly long time if the share is hosted on an older Windows Server version like Windows Server 2019. [Windowing] ​​​​​​​Fixed: When you press ALT + Tab to switch out of a full screen game, other windows like Windows Terminal might stop responding. Fixed: An underlying issue might lead to unexpected changes to window size and position after sleep and resume on some devices. Fixed: Explorer.exe might stop working unexpectedly when a window is dragged, if window snapping is enabled. And here is what is available to all users right away: [Copilot] Fixed: Improved the Copilot key’s reliability and resolved an issue that prevented users from restarting Copilot after using the key. [Performance] Fixed: This update addresses an issue to maintain efficiency of Storage Spaces Direct (S2D). When running complex software defined data center (SDDC) related workflows, it’s possible the system might become unresponsive. [Storage optimization] Fixed: An issue that prevented unused language packs and Feature on Demand packages from being fully removed, which led to unnecessary storage use and longer Windows Update installation times. [Windows Hello] Fixed: This update addresses an issue that prevented the automatic renewal of expiring certificates in Windows Hello for Business. [Windows Search] ​​​​​​​Fixed: Windows Search responds very slowly—Search can take over 10 seconds to load before you can use it. Fixed: This update enhances the reliability of Windows Search and resolves an issue that prevented users from typing in Windows Search in some cases. Known issues remain the same: there is a bug where Windows 11 renders Noto fonts poorly in Chromium-based browsers when the display scaling is set to 100%. A temporary workaround is to set the scaling to 125% or higher. You can download KB5060829 by heading to Settings > Windows Update. Alternatively, get it from the Microsoft Update Catalog using this link.
    • 4TB 2TB WD_BLACK SN8100 Gen5 NVMe 2280 SSDs are nice deals for sure by Sayan Sen Storage comes in different forms, both fast and slow, large and small. For example, the Seagate Exos 20TB and BarraCuda 16 TB are high-capacity CMR hard disk drives, and they are currently still selling at great prices. On the other hand, we have SSDs, which are much faster. The SK hynix P41 Platinum, for example, is a PCIe 4.0 drive and it is up for grabs at $120. If you need something faster, then take a look at Western Digital (WD) options, wherein we have the SN8100 PCIe Gen5 NVMe SSD at the lowest prices of just $480 for the 4TB variant and $260 for the 2TB variant (purchase links down below). Since this is a PCIe 5.0 SSD it is very fast, but like other Gen5 drives, it gets very hot especially for sustained reads and writes. So a heatsink is definitely recommended. The S8100 promises sequential reads of up to 14900 MB/s and sequential writes of up to 14,000 MB/s. In terms of random reads, it is rated for 2.3 million IOPS (input output operations per second) and random writes of up to 2.4 million IOPS. There are two reasons for the high random throughputs, first is the DRAM, as the SN8100 packs its own DRAM cache instead of relying on system memory, and second, the relatively new NAND technology called CMOS directly bonded to array (CBA) that helps boost interface speed. Speaking of NAND, the SN8100 features TLC, or triple-level, cells which provides excellent endurance and this 4TB drive is rated for 2400 TBW, while the 2TB is at 1200 TBW. Get the WD SN8100 at the links below: WD_BLACK 4TB SN8100 - WDS400T1X0M: $479.99 (Sold and Shipped by Amazon US) || $479.99 (Sold and Shipped by Newegg US) WD_BLACK 2TB SN8100 - WDS400T1X0M: $259.99 (Sold and Shipped by Amazon US) || $259.99 (Sold and Shipped by Newegg US) This Amazon deal is US-specific and not available in other regions unless specified. If you don't like it or want to look at more options, check out the Amazon US deals page here. Get Prime (SNAP), Prime Video, Audible Plus or Kindle / Music Unlimited. Free for 30 days. As an Amazon Associate, we earn from qualifying purchases.
    • Oof my bad.. I just realized the WAN port on the X10 only allows 1Gbit input from the modem >.<  arghh that derails everything lol. OK I'll need a new router then. Since I'm the only wired user in the household (everything else is wireless), would you say the RS100 is a good solution?  apparently not.. accepts 2.5Gbit, but outputs 1Gbit.. Still need a better solution. As for a build your own, I'm afraid i'm limited in power here  PC, Games, TVs, Printer, Receiver/Amp+ secondary Amp.. heck even the Mini Split (don't ask how/why they did it this way). One more major device trips the breaker :P Although I just noticed this tidbit on my router's config page. It almost looks as if the modem could somehow input "INTO" the 10Gbit port, rather than using the 10Gbit port as an output? https://imgur.com/a/uiObRNK What would be the purpose of this?  Even if that were possible, I'm still limited to 1Gbit regular ports ToT  I was also looking at the link aggregation on the router, but apparently that doesn't really help with download from a single stream, just balances.. and Win10/11 seems to have broken this functionality due to it never being intended for consumer OSs. I had it running at one point using the Intel proset driver that came with the mobo (has two LANs, just because I could), but that one update killed it. Not sure if that would at least have contributed to something.
    • YESSS!!!!! Now that this comes up, interesting that MS hasn't planned on a 25H2... 🤔
  • Recent Achievements

    • One Month Later
      jfam earned a badge
      One Month Later
    • First Post
      TheRingmaster earned a badge
      First Post
    • Conversation Starter
      Kavin25 earned a badge
      Conversation Starter
    • One Month Later
      Leonard grant earned a badge
      One Month Later
    • Week One Done
      pcdoctorsnet earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      562
    2. 2
      ATLien_0
      190
    3. 3
      +FloatingFatMan
      176
    4. 4
      Michael Scrip
      145
    5. 5
      Xenon
      115
  • Tell a friend

    Love Neowin? Tell a friend!