Is this EXT2 parsing implementation correct?


Recommended Posts

I need to write a small application that checks whether a file exists on an EXT2/3 partition. I looked for some simple directory parsing implementation and came across this one from from https://github.com/TheCodeArtist/ext2-parser/blob/master/ext-shell.c:

void ls(int fd, int base_inode_num)
{
char* name;
int curr_inode_num;
int curr_inode_type;
 
debug("data block addr\t= 0x%x\n", inodes[base_inode_num-1].i_block[0]);
 
struct os_direntry_t* dirEntry = malloc(sizeof(struct os_direntry_t));
assert (dirEntry != NULL);
assert(lseek(fd, (off_t)(inodes[base_inode_num-1].i_block[0]*1024), SEEK_SET) == (off_t)(inodes[base_inode_num-1].i_block[0]*1024));
assert(read(fd, (void *)dirEntry, sizeof(struct os_direntry_t)) == sizeof(struct os_direntry_t));
 
while (dirEntry->inode) {
 
name = (char*)malloc(dirEntry->name_len+1);
memcpy(name, dirEntry->file_name, dirEntry->name_len);
name[dirEntry->name_len+1] = '\0';
 
curr_inode_num = dirEntry->inode;
curr_inode_type = dirEntry->file_type;
 
lseek(fd, (dirEntry->rec_len - sizeof(struct os_direntry_t)), SEEK_CUR);
assert(read(fd, (void *)dirEntry, sizeof(struct os_direntry_t)) == sizeof(struct os_direntry_t));
 
if (name[0] == '.') {
if ( name[1]=='.' || name[1]=='\0')
continue;
} else {
debug("rec_len\t\t= %d\n", dirEntry->rec_len);
debug("dirEntry->inode\t= %d\n",dirEntry->inode);
printInodeType(curr_inode_type);
printInodePerm(fd, curr_inode_num);
printf("%d\t", curr_inode_num);
printf("%s\t", name);
printf("\n");
}
}
 
return;
}

It seems to start reading the first directory inode data block and keeps reading all the directory entry structures however I don't understand how is this supposed to work if the directory contains more entries that can be stored in a single inode data block. Is this implementation incomplete/wrong, am I reading it incorrectly or is there some EXT2/3 behavior I'm not aware of? Shouldn't the function read the directory inode data blocks before starting to read the entries?

Why are you trying to read the filesystem metadata directly? It seems potentially problematic and unnecessarily complicated. You can check whether a file exists on any filesystem Linux understands - whether that be EXT2, EXT3, EXT4, XFS, BTRFS, UDF, HFS, HFS+, UFS, FAT32, or NTFS - with a few simple commands. For example:

$ sudo mount -o ro /dev/sdb1 /mnt
$ find /mnt -name '*somefile.txt'
$ sudo umount /mnt
 

If your goal is to search the contents of a read-only image file like the project you linked to seems to suggest, you can also do that fairly simply using existing utilities. For example:

$ sudo kpartx -av some_hard_disk_image.img
$ sudo mount -o ro /dev/mapper/loop0p1 /mnt
$ find /mnt -name '*somefile.txt'
$ sudo umount /mnt
$ sudo kpartx -d some_hard_disk_image.img
  On 23/09/2013 at 20:00, xorangekiller said:

Why are you trying to read the filesystem metadata directly? It seems potentially problematic and unnecessarily complicated. You can check whether a file exists on any filesystem Linux understands - whether that be EXT2, EXT3, EXT4, XFS, BTRFS, UDF, HFS, HFS+, UFS, FAT32, or NTFS - with a few simple commands. For example:

I don't see what is complicated. I did the same for NTFS and FAT32 without any problem, it was actually pretty easier since they aren't splitted up in all those annoying regions. While EXT2/3 are documented as well the documentation is quite bad since it doesn't cover the common implementations behaviors. In the end I just followed how it was implemented in the kernel sources, certainly having being able to read directories in sequence without having to read the data as a file first would have helped a lot. Having to ask users to install cygwin or other pieces of garbage for a minor function, now that would have been complicated. With ~200 lines of code (ignoring the structures) I solved the problem without any other annoying library or additional software.

So if I understand you proprerly, you were trying to read data from an EXT2 partition on Windows? In that case even installing Cygwin like you suggest would not have helped. Although I now understand why you couldn't use the sequence of basic commands I posted above, I stand by my assertion that you are going about this the wrong way. It sounds like Windows is not the right tool for the job.

 

That said, I would love to see your source code if you have it working. I'm assuming it is GPLv2 licensed since you took code from the kernel, of course.

  On 27/09/2013 at 00:24, xorangekiller said:

So if I understand you proprerly, you were trying to read data from an EXT2 partition on Windows?

On Linux there are several libraries available for that making the whole ordeal quite pointless.

 

  On 27/09/2013 at 00:24, xorangekiller said:

In that case even installing Cygwin like you suggest would not have helped.

Yes it would because those same libraries can be compiled and used there. But requiring something as horrible as Cygwin to be installed on user machines just to save a few hundred lines of code didn't really seem a good idea.

 

  On 27/09/2013 at 00:24, xorangekiller said:

Although I now understand why you couldn't use the sequence of basic commands I posted above, I stand by my assertion that you are going about this the wrong way. It sounds like Windows is not the right tool for the job.

I don't think that requiring users to reboot back to linux every time they want to find a file is a solution either.

 

  On 27/09/2013 at 00:24, xorangekiller said:

Although I now understand why you couldn't use the sequence of basic commands I posted above, I stand by my assertion that you are going about this the wrong way. It sounds like Windows is not the right tool for the job.

Following an implementation doesn't mean copying any line of code, code that would have been entirely useless anyway since that's a driver code, and drivers are made for asynchronous access therefore full of additional abstractions, thread synchronization and memory paging. Following the implementation meant simply looking which fields of the filesystem structures the EXT2 driver was looking for doing those basic calculations for finding out which data to read next. I found some noticeable differences between how the ext2 driver handles reading the directories compared to that code example I linked above and also to the many confused and incomplete specifications available.

 

I don't want to post the code since it relies on a lot additional code to perform the disk reads and other application tasks but I can describe the steps required to obtain the same, in case anybody with the same problem would ever stumble into this thread:

  1. Read the SuperBlock that is always 1024 bytes after the beginning of the partition
  2. Calculate the block size by shifting the base block size on the left by the logarithmic block size value in the superblock (a simple 1024 << LOGBLOCKVALUE)
  3. Calculate the logical superblock block number by dividing the SuperBlock offset by the block size
  4. Calculate the number of Block Groups by dividing the SuperBlock inodes count value by the SuperBlock inodes per group value
  5. Read all the blocks group descriptors into an array (they're stored sequentially). The position of the first descriptor is at BLOCKSIZE*(SUPERBLOCK_BLOCKNUMBER + 1).
  6. Read the root directory inode (always 2). In order to do that it's required to know which group of inodes the inode belongs to (inodes are compacted in several groups, one for each blocks group), the position of the inodes table, and the position of that inode within the inodes group. The group number is a simple (INODENUMBER-1) (inodes are non-positional and 0 is reserved so in the calculations you always have to subtract 1) divided by the INODES_PER_GROUP value of the Superblock. Then to find the inodes block table position you read the INODES_TABLE_BLOCK_NUMBER of the matching group from the blocks group descriptors array read before and multiply it by the block size. Then to find the byte offset of the inode within that inode table you first find the logical inode number in the table by doing a module division of the (INODENUMBER-1) by the superblock INODES_PER_GROUP value. Then you multiply that value by the INODE_STRUCTURE_SIZE value of the superblock finding out the byte offset inside the table you need to read at.
  7. Then you need a function to read directory or files, reading both is actually exactly the same. Every inode has a BLOCKS array containing 15 block numbers, the first 12 items are direct block values, the 13th item is a block number of a block that is entirely used for storing additional block values, the 14th and 15th block numbers are the same but with twice and thrice the indirection. There's a value in the inode (BLOCKSCOUNT) that tells exactly how many blocks there are to read. Blocks numbers equal to zero are non-allocated blocks (used in sparse files).
  8. Once obtained the directory data you start reading all the directory structures one by one, moving by the record length value. The file name is adjacent to the structure and the file type is stored in the directory itself. The list should end when the record length is 0. If a directory structure contains an inode with the value 0 it means the item has been deleted (explaining how recovering files on EXT2 was quite the nightmare, you no longer know where it is and there's no other name associated to it).

The longest part is reading the blocks values but I think the whole implementation can be done with 100 lines or less on an unmanaged language especially if you don't care about reading the whole structures.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • LibreOffice 25.2.4 by Razvan Serea LibreOffice is the free power-packed Open Source personal productivity suite for Windows, Macintosh and Linux, that gives you six feature-rich applications for all your document production and data processing needs: Writer, Calc, Impress, Draw, Math and Base. Support and documentation is free from our large, dedicated community of users, contributors and developers. You, too, can also get involved! Choosing Between LibreOffice Still and LibreOffice Fresh: LibreOffice Still is a good choice if you value stability, a longer support cycle, and a more conservative approach to software updates. It's suitable for businesses and organizations where reliability and compatibility are crucial. LibreOffice Fresh is ideal if you're an enthusiast or an early adopter who wants to stay on the cutting edge of LibreOffice development and is willing to accept more frequent updates and occasional minor issues. Features: Writer is the word processor inside LibreOffice. Use it for everything, from dashing off a quick letter to producing an entire book with tables of contents, embedded illustrations, bibliographies and diagrams. The while-you-type auto-completion, auto-formatting and automatic spelling checking make difficult tasks easy (but are easy to disable if you prefer). Writer is powerful enough to tackle desktop publishing tasks such as creating multi-column newsletters and brochures. The only limit is your imagination. Calc tames your numbers and helps with difficult decisions when you're weighing the alternatives. Analyze your data with Calc and then use it to present your final output. Charts and analysis tools help bring transparency to your conclusions. A fully-integrated help system makes easier work of entering complex formulas. Add data from external databases such as SQL or Oracle, then sort and filter them to produce statistical analyses. Use the graphing functions to display large number of 2D and 3D graphics from 13 categories, including line, area, bar, pie, X-Y, and net - with the dozens of variations available, you're sure to find one that suits your project. Impress is the fastest and easiest way to create effective multimedia presentations. Stunning animation and sensational special effects help you convince your audience. Create presentations that look even more professional than the standard presentations you commonly see at work. Get your collegues' and bosses' attention by creating something a little bit different. Draw lets you build diagrams and sketches from scratch. A picture is worth a thousand words, so why not try something simple with box and line diagrams? Or else go further and easily build dynamic 3D illustrations and special effects. It's as simple or as powerful as you want it to be. Base is the database front-end of the LibreOffice suite. With Base, you can seamlessly integrate into your existing database structures. Based on imported and linked tables and queries from MySQL, PostgreSQL or Microsoft Access and many other data sources, you can build powerful databases containing forms, reports, views and queries. Full integration is possible with the in-built HSQL database. Math is a simple equation editor that lets you lay-out and display your mathematical, chemical, electrical or scientific equations quickly in standard written notation. Even the most-complex calculations can be understandable when displayed correctly. E=mc2. LibreOffice also comes configured with a PDF file creator, meaning you can distribute documents that you're sure can be opened and read by users of almost any computing device or operating system. LibreOffice also comes configured with a PDF file creator, meaning you can distribute documents that you're sure can be opened and read by users of almost any computing device or operating system. Download: LibreOffice 64-bit | LibreOffice 32-bit ~300.0 MB (Open Source) View: LibreOffice Website | Screenshot | Release Notes Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • I'm not sure why, but for some reason I think that if they are deciding to use the year for the version history they should use the whole year (i.e. iOS 2026).
    • Here's why it makes sense to name it iOS 26 and why it doesn't by Aditya Tiwari It has been almost 18 years since Apple launched the first version of its popular mobile operating system alongside the original iPhone. Recent reports and rumors circulating on the web suggest that the company is all set to unveil a major overhaul for iOS 19 at this year's WWDC keynote. There is something that baffled many when they found that the Cupertino giant is reportedly planning to rename iOS 19 to iOS 26. Yes, a company like Apple skipping eight versions for iOS is enough to leave users with a "why?" expression on their face. However, even if Apple pulls it off, there are two sides to the coin. Why it makes sense to call it iOS 26 There are several reasons why calling it iOS 26 instead of iOS 19 isn't as weird as it sounds. To begin with, it's something that has been done in the past. Samsung is a well-known example when we think about renaming device lineups and skipping version numbers. Samsung launched the Galaxy S20 series in 2020. But what was its predecessor? Galaxy S19? No, it was the Galaxy S10. The South Korean giant renamed its device lineup and aligned it with the year of launch, jumping ten versions in the process. So, someone viewing a Galaxy S23 can easily determine that the device was launched in 2023. It also gives them a feeling that they are using the 'latest and greatest' product. On the flip side, a device from the previous year may feel outdated, potentially motivating them to upgrade. Skipping version numbers isn't fun and games for everyone. Microsoft became the butt of jokes when it skipped Windows 9 and announced that Windows 8.1 will be upgraded to Windows 10 (that too in 2015). Windows 10 was thought to be "the last version of Windows", but things turned out differently. Apple's case would be a bit different, where the iOS version number is one year ahead. So, iOS 26 will release in 2025, iOS 27 in 2026, and so on. This approach is similar to how game companies like Electronic Arts name their gaming titles. Although it may seem off track, the naming scheme aligns with Apple's development schedule. The company typically announces new iOS versions at WWDC in June and rolls them out to the public in the fall season. After that, it continues to push incremental updates through the following year. In other words, a particular iOS version lives on your iPhone for a quarter of the launch year and about nine to ten months in the following year. Meanwhile, Samsung releases new Galaxy S devices at the start of the year, so it makes more sense to align their name with the current year. Not just iOS 26, reports said that Apple will streamline its confusing software naming system by renaming almost all of its operating systems to a single version. So, there will be iPadOS 26, macOS 26, tvOS 26, and watchOS 26 instead of iPadOS 19, macOS 16, tvOS 12, and so on. While the big move will make things easier for users, it will also highlight the work Apple has been doing to unify its software experience across devices. iOS and iPadOS have been related to each other from the beginning, but macOS gained ARM support in 2020 and began incorporating iOS-like UI elements. Apple has already developed a suite of Continuity features that enable different Apple devices to work together. macOS 14 Sonoma further bridged the gap between iPhone and Mac in 2023 with a revamped widgets picker UI, which allows access to and syncing with widgets stored on your iPhone. New widgets introduced in macOS 14 are interactive, similar to those on iPhone. They let you do stuff like checking off reminders, playing or pausing media, accessing smart home controls, and more. Apple's iOS 26/iOS 19 would be the second major naming shake-up in the history of iOS. The first one was when Apple renamed the operating system from iPhone OS to iOS in June 2010. iOS 26 is expected to be the biggest update in years, reportedly featuring a 'dramatic' glass-like UI overhaul, a revamped Camera app, live translation for AirPods, a new gaming app, and a new set of accessibility features. The glass-like design, first introduced on Apple's Vision Pro headset, is expected to make its way to tvOS and watchOS. Why doesn't it make sense to call it iOS 26 It already feels a bit awkward when you realize that the iPhone 16 runs iOS 18, for whatever reason, when the first iterations of both iPhone and iOS arrived in the same year. Adding eight more digits to the iOS version number will make it sound even weirder. The 19th generation of iPhone's operating system will be called iOS 26. Imagine buying an iPhone 17 later this year, and it runs iOS 26 out of the box. However, there are a couple of things Apple can do to tone down the awkwardness. Perhaps Apple can rename the iPhone series and start calling it iPhone 26 to match its software counterpart. A far-fetched and even more unlikely option is to drop version numbers from the iPhone's name entirely. Apple is already doing it for its tablets (iPad, iPad Pro, and iPad Air) and its Mac computers. Therefore, it won't be an issue once the users absorb the initial shock of the announcement. But we can't ignore that not having a version number tied to a product has its downsides. These are all speculations anyway. Whatever happens, Apple fans will get over it and learn to live with it, like they are living with the hopes of an upgraded Siri and AirPower to charge their Apple devices together.
    • I'd prefer the disclaimer being more transparent by putting it above the article.
    • dBpoweramp Music Converter 2025-06-05 by Razvan Serea Audio conversion perfected, effortlessly convert between formats. dBpoweramp contains a multitude of audio tools in one: CD Ripper, Music Converter, Batch Converter, ID Tag Editor and Windows audio shell enhancements. Preloaded with essential codecs (mp3, wave, FLAC, m4a, Apple Lossless, AIFF), additional codecs can be installed from [Codec Central], as well as Utility Codecs which perform actions on audio files. After 21 days the trial will end, reverting to dBpoweramp Free edition (learn the difference between Reference and dBpoweramp Free, here). dBpoweramp is compatible with Windows 10, 8, 7, Vista, both 32 and 64 bit. dBpoweramp Music Converter features: Convert audio files with elegant simplicity. mp3, mp4, m4a (iTunes / iPod), Windows Media Audio (WMA), Ogg Vorbis, AAC, Monkeys Audio, FLAC, Apple Lossless (ALAC) to name a few! Multi CPU Encoding Support Rip digitally record audio CDs (with CD Ripper) Batch Convert large numbers of files with 1 click Windows Integration popup info tips, audio properties, columns, edit ID-Tags DSP Effects such as Volume Normalize, or Graphic EQ [Power Pack Option] Command Line Encoding: invoke the encoder from the command line DSP Effects - process the audio with Volume Normalize, or Sample / Bit Rate Conversion, with over 30 effects dBpoweramp is a fully featured mp3 Converter dBpoweramp integrates into Windows Explorer, an mp3 converter that is as simple as right clicking on the source file >> Convert To. Popup info tips, Edit ID-Tags are all provided. dBpoweramp Music Converter 2025.06.05 changelog: Darkmode added Core Converter Debug log dumps ID Tags written VST Effect Folders dialog fixed missing InitCommonControls would not show correctly FLAC/Ogg/Opus/etc - allows editing of CDTOC ID Tag CD Ripper secure ripping log where shows TOC was not showing CD Extra correctly CD Ripper was incorrectly setting data track length on main display (for certain drives) CD Ripper internally better handling of corrupt TOCs CD TOC to Tag was incorrectly adding 150 to CD Extra disc CD Ripper shows "AccurateRip Unconfigured" in rip status rather than "not in accuraterip" if unconfigured CD Ripper art paste accepts https CueSheet added as standard - log file written to same folder as cue and folder.jpg AIFF internal code merge (macos >> windows) Download: dBpoweramp Music Converter R2025.06.05 | 82.2 MB (Shareware) View: dBpowerAMP Music Converter Website | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
  • Recent Achievements

    • Week One Done
      abortretryfail earned a badge
      Week One Done
    • First Post
      Mr bot earned a badge
      First Post
    • First Post
      Bkl211 earned a badge
      First Post
    • One Year In
      Mido gaber earned a badge
      One Year In
    • One Year In
      Vladimir Migunov earned a badge
      One Year In
  • Popular Contributors

    1. 1
      +primortal
      495
    2. 2
      snowy owl
      255
    3. 3
      +FloatingFatMan
      252
    4. 4
      ATLien_0
      227
    5. 5
      +Edouard
      191
  • Tell a friend

    Love Neowin? Tell a friend!