Is this EXT2 parsing implementation correct?


Recommended Posts

I need to write a small application that checks whether a file exists on an EXT2/3 partition. I looked for some simple directory parsing implementation and came across this one from from https://github.com/TheCodeArtist/ext2-parser/blob/master/ext-shell.c:

void ls(int fd, int base_inode_num)
{
char* name;
int curr_inode_num;
int curr_inode_type;
 
debug("data block addr\t= 0x%x\n", inodes[base_inode_num-1].i_block[0]);
 
struct os_direntry_t* dirEntry = malloc(sizeof(struct os_direntry_t));
assert (dirEntry != NULL);
assert(lseek(fd, (off_t)(inodes[base_inode_num-1].i_block[0]*1024), SEEK_SET) == (off_t)(inodes[base_inode_num-1].i_block[0]*1024));
assert(read(fd, (void *)dirEntry, sizeof(struct os_direntry_t)) == sizeof(struct os_direntry_t));
 
while (dirEntry->inode) {
 
name = (char*)malloc(dirEntry->name_len+1);
memcpy(name, dirEntry->file_name, dirEntry->name_len);
name[dirEntry->name_len+1] = '\0';
 
curr_inode_num = dirEntry->inode;
curr_inode_type = dirEntry->file_type;
 
lseek(fd, (dirEntry->rec_len - sizeof(struct os_direntry_t)), SEEK_CUR);
assert(read(fd, (void *)dirEntry, sizeof(struct os_direntry_t)) == sizeof(struct os_direntry_t));
 
if (name[0] == '.') {
if ( name[1]=='.' || name[1]=='\0')
continue;
} else {
debug("rec_len\t\t= %d\n", dirEntry->rec_len);
debug("dirEntry->inode\t= %d\n",dirEntry->inode);
printInodeType(curr_inode_type);
printInodePerm(fd, curr_inode_num);
printf("%d\t", curr_inode_num);
printf("%s\t", name);
printf("\n");
}
}
 
return;
}

It seems to start reading the first directory inode data block and keeps reading all the directory entry structures however I don't understand how is this supposed to work if the directory contains more entries that can be stored in a single inode data block. Is this implementation incomplete/wrong, am I reading it incorrectly or is there some EXT2/3 behavior I'm not aware of? Shouldn't the function read the directory inode data blocks before starting to read the entries?

Why are you trying to read the filesystem metadata directly? It seems potentially problematic and unnecessarily complicated. You can check whether a file exists on any filesystem Linux understands - whether that be EXT2, EXT3, EXT4, XFS, BTRFS, UDF, HFS, HFS+, UFS, FAT32, or NTFS - with a few simple commands. For example:

$ sudo mount -o ro /dev/sdb1 /mnt
$ find /mnt -name '*somefile.txt'
$ sudo umount /mnt
 

If your goal is to search the contents of a read-only image file like the project you linked to seems to suggest, you can also do that fairly simply using existing utilities. For example:

$ sudo kpartx -av some_hard_disk_image.img
$ sudo mount -o ro /dev/mapper/loop0p1 /mnt
$ find /mnt -name '*somefile.txt'
$ sudo umount /mnt
$ sudo kpartx -d some_hard_disk_image.img
  On 23/09/2013 at 20:00, xorangekiller said:

Why are you trying to read the filesystem metadata directly? It seems potentially problematic and unnecessarily complicated. You can check whether a file exists on any filesystem Linux understands - whether that be EXT2, EXT3, EXT4, XFS, BTRFS, UDF, HFS, HFS+, UFS, FAT32, or NTFS - with a few simple commands. For example:

I don't see what is complicated. I did the same for NTFS and FAT32 without any problem, it was actually pretty easier since they aren't splitted up in all those annoying regions. While EXT2/3 are documented as well the documentation is quite bad since it doesn't cover the common implementations behaviors. In the end I just followed how it was implemented in the kernel sources, certainly having being able to read directories in sequence without having to read the data as a file first would have helped a lot. Having to ask users to install cygwin or other pieces of garbage for a minor function, now that would have been complicated. With ~200 lines of code (ignoring the structures) I solved the problem without any other annoying library or additional software.

So if I understand you proprerly, you were trying to read data from an EXT2 partition on Windows? In that case even installing Cygwin like you suggest would not have helped. Although I now understand why you couldn't use the sequence of basic commands I posted above, I stand by my assertion that you are going about this the wrong way. It sounds like Windows is not the right tool for the job.

 

That said, I would love to see your source code if you have it working. I'm assuming it is GPLv2 licensed since you took code from the kernel, of course.

  On 27/09/2013 at 00:24, xorangekiller said:

So if I understand you proprerly, you were trying to read data from an EXT2 partition on Windows?

On Linux there are several libraries available for that making the whole ordeal quite pointless.

 

  On 27/09/2013 at 00:24, xorangekiller said:

In that case even installing Cygwin like you suggest would not have helped.

Yes it would because those same libraries can be compiled and used there. But requiring something as horrible as Cygwin to be installed on user machines just to save a few hundred lines of code didn't really seem a good idea.

 

  On 27/09/2013 at 00:24, xorangekiller said:

Although I now understand why you couldn't use the sequence of basic commands I posted above, I stand by my assertion that you are going about this the wrong way. It sounds like Windows is not the right tool for the job.

I don't think that requiring users to reboot back to linux every time they want to find a file is a solution either.

 

  On 27/09/2013 at 00:24, xorangekiller said:

Although I now understand why you couldn't use the sequence of basic commands I posted above, I stand by my assertion that you are going about this the wrong way. It sounds like Windows is not the right tool for the job.

Following an implementation doesn't mean copying any line of code, code that would have been entirely useless anyway since that's a driver code, and drivers are made for asynchronous access therefore full of additional abstractions, thread synchronization and memory paging. Following the implementation meant simply looking which fields of the filesystem structures the EXT2 driver was looking for doing those basic calculations for finding out which data to read next. I found some noticeable differences between how the ext2 driver handles reading the directories compared to that code example I linked above and also to the many confused and incomplete specifications available.

 

I don't want to post the code since it relies on a lot additional code to perform the disk reads and other application tasks but I can describe the steps required to obtain the same, in case anybody with the same problem would ever stumble into this thread:

  1. Read the SuperBlock that is always 1024 bytes after the beginning of the partition
  2. Calculate the block size by shifting the base block size on the left by the logarithmic block size value in the superblock (a simple 1024 << LOGBLOCKVALUE)
  3. Calculate the logical superblock block number by dividing the SuperBlock offset by the block size
  4. Calculate the number of Block Groups by dividing the SuperBlock inodes count value by the SuperBlock inodes per group value
  5. Read all the blocks group descriptors into an array (they're stored sequentially). The position of the first descriptor is at BLOCKSIZE*(SUPERBLOCK_BLOCKNUMBER + 1).
  6. Read the root directory inode (always 2). In order to do that it's required to know which group of inodes the inode belongs to (inodes are compacted in several groups, one for each blocks group), the position of the inodes table, and the position of that inode within the inodes group. The group number is a simple (INODENUMBER-1) (inodes are non-positional and 0 is reserved so in the calculations you always have to subtract 1) divided by the INODES_PER_GROUP value of the Superblock. Then to find the inodes block table position you read the INODES_TABLE_BLOCK_NUMBER of the matching group from the blocks group descriptors array read before and multiply it by the block size. Then to find the byte offset of the inode within that inode table you first find the logical inode number in the table by doing a module division of the (INODENUMBER-1) by the superblock INODES_PER_GROUP value. Then you multiply that value by the INODE_STRUCTURE_SIZE value of the superblock finding out the byte offset inside the table you need to read at.
  7. Then you need a function to read directory or files, reading both is actually exactly the same. Every inode has a BLOCKS array containing 15 block numbers, the first 12 items are direct block values, the 13th item is a block number of a block that is entirely used for storing additional block values, the 14th and 15th block numbers are the same but with twice and thrice the indirection. There's a value in the inode (BLOCKSCOUNT) that tells exactly how many blocks there are to read. Blocks numbers equal to zero are non-allocated blocks (used in sparse files).
  8. Once obtained the directory data you start reading all the directory structures one by one, moving by the record length value. The file name is adjacent to the structure and the file type is stored in the directory itself. The list should end when the record length is 0. If a directory structure contains an inode with the value 0 it means the item has been deleted (explaining how recovering files on EXT2 was quite the nightmare, you no longer know where it is and there's no other name associated to it).

The longest part is reading the blocks values but I think the whole implementation can be done with 100 lines or less on an unmanaged language especially if you don't care about reading the whole structures.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Is it that easy though? WhatsApp is the default way the majority message in a lot of countries these days. I would prefer Signal to be as popular as WhatsApp and probably could get a few people to use it, most people are probably going to stick with WhatsApp sadly. Which leaves SMS or Facebook Messenger as alternatives a lot of people also have. (Here anyway, I know iMessage, LINE and WeChat dominate in certain parts of the world). It annoying Meta purchased WhatsApp all those years ago.
    • Do they tell Google not to scrape their content via something like robots.txt? Do they specify anywhere that certain agents aren't to scrape? If not, tough. Plus there's no obligation on anyone's part to adhere to any directives that might be in this file anyway...
    • DMDE 4.3.5.823 Beta by Razvan Serea DMDE is a software designed to effectively recover lost data. It retrieves files and folders swiftly and stores them in the user-defined location. It is an easy to use yet powerful tool that will assist both novice and experienced users in getting back lost files in just a few simple steps. Free Edition includes all basic features but a single recovery operation recovers up to 4000 files in the current panel only (you should first open a subdirectory in the current panel and then recover files in the panel). In paid licenses there is no this restriction, and recovery of nested directories is allowed. Can paid versions recover more files than the free version of DMDE? If a file cannot be recovered in the DMDE Free Edition (or it is damaged after recovery) the same will occur in the paid versions. DMDE paid versions are capable of recovering the same files. The only difference is that paid versions can recover all found files in one go, as well as restore the directory structure presented in the free version. Professional Edition provides additional features: rights to provide data recovery services portable use on different computers one-time activation on client computers (including remote use) data recovery reports (include logs and file checksums) read support for E01 disk image files using logs when copying a disk (resume copying, multiple passes) customizable I/O handler script recovery of NTFS alternate data streams DMA access in DOS (for ATA interface) DMDE key features: Portable run without installation Support for NTFS, FAT12/16, FAT32, exFAT, ReFS, Ext2/Ext3/Ext4, btrfs, HFS+/HFSX, APFS Thorough FS and Raw scan, FS reconstruction for data recovery in complex cases Simple partition manager for express search, diagnostics, and restoration of partitions Disk cloning and disk image creating, including I/O error handling, reverse copying, and other features RAID constructor for virtual RAID reconstruction supporting levels RAID-0, RAID-1, RAID-4, RAID-5, RAID-6, delayed parity, custom striping, JBOD/spanned disks; automatic calculation of RAID configurations Cluster map to investigate file allocation Disk editor compatible with the most recent Windows versions which allows viewing, editing, and navigating through different disk structures using built-in and custom templates NTFS tools to work bypassing NTFS driver (copy, delete file, create, repair directory) Support for various device I/O interfaces and settings to work with damaged devices, disk images, NTFS compression and encryption, national names, large disks, large files, large sectors, and other features DMDE 4.3.5.823 Beta changelog: Expanded built-in signatures for RAW search functionality Added file list export to HTML format (DMDE Professional Edition only, view sample) Improved handling of I/O errors with selective skipping by error code Enabled preview support for additional image (graphic) file types (Windows only) Improved extfs reconstruction when copies of superblocks with group descriptors are found Fixed potential hang during Btrfs volume reconstruction Resolved issue with cluster list creation when subfolders are present Other improvements and fixes Download: DMDE 64-bit | 2.4 MB (Free, paid upgrade available) Download: DMDE 32-bit | 2.0 MB Link: DMDE Home Page | DMDE Manual | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • The BBC might have gone about this the wrong way, but if there is a revenue sharing program then they and all other "providers" of data should be included in the plan.
    • PicView 3.1.4 by Razvan Serea PicView is a fast, free and fully customizable image viewer for Windows 10 and 11. It supports a vast range of image file types, including WEBP, GIF, SVG, PNG, JXL, HEIC, PSD and many others. Additional features includes viewing EXIF metadata, image compression, batch resizing, viewing images within archives and comic books, image effects, image galleries, and more. Available in portable and installable versions. PicView 3.1.4 changelog: What's new Mouse Side Buttons Customization: You can now change how the mouse side buttons work. Choose to navigate file history or switch between directories. Find this setting under the Mouse tab in the settings window (#199). Improvements Directory Navigation: Improved navigation between directories. If Search subdirectories is enabled, PicView moves to the next directory in the list; if off (or if there are no directories in the list), it navigates as before. PicView now remembers your startup directory (including subdirectories) and restores it on the next launch. Performance: Preloader has been fine-tuned for better performance and lower memory usage. Bug Fixes Fixed incorrect saved settings path when saving in portable mode (#213). Corrected cases where keybindings and file history were not saved when there was no write permission. Restoring the window from a maximized state now correctly keeps its position when auto-fit is enabled. Fixed start-up menu text alignment and missing text issues. Translations Hebrew translation by @Y-PLONI (#212) Hungarian translation by @JohnFowler58 (#209) Japanese update by @coolvitto (#214) Miscellaneous Updated to Avalonia 11.3.1. Now using ZLinq and ZLinq.FileSystem to reduce memory allocation and possibly improve performance. Future versions managed via Scoop will now preserve configuration files (#15555). Experimental settings (not yet available in the UI—edit UserSettings.json directly): Disable the file watcher (not recommended, untested). Choose preload amount and direction (forward/backward). Download: PicView 3.1.4 | Portable ~50.0 MB (Open Source) Download: PicView ARM64 | Portable ARM64 Links: PicView Home Page | Github Project Page | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
  • Recent Achievements

    • Week One Done
      Crunchy6 earned a badge
      Week One Done
    • One Month Later
      KynanSEIT earned a badge
      One Month Later
    • One Month Later
      gowtham07 earned a badge
      One Month Later
    • Collaborator
      lethalman went up a rank
      Collaborator
    • Week One Done
      Wayne Robinson earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      681
    2. 2
      ATLien_0
      276
    3. 3
      Michael Scrip
      221
    4. 4
      +FloatingFatMan
      170
    5. 5
      Steven P.
      164
  • Tell a friend

    Love Neowin? Tell a friend!