• 0

[C#] Reading text from MS Word files


Question

hey guys. how can i read the text from an MS word, and possibly other Ms Office files while using the least possible resources. If someone could share some code they already have for this, it wud be really sweet :p

basically what im trying to do is build a desktop searching application like MSNs googles and the others in C# for a class project. but mine doesnt have to be as complex or as feature rich. just a basic version of what they do.

any other tips wud also be appreciated.

thanks

danish

ps: im storing the data im indexing in an MS Access file. seems inefficient to me. any better way to do that?

Link to comment
https://www.neowin.net/forum/topic/316480-c-reading-text-from-ms-word-files/
Share on other sites

Recommended Posts

  • 0

What you'll probably have to do is add a COM reference to the Microsoft Word Object library and access word that way, but the word API isn't the friendliest thing in the world. this link should help though....

http://www.codeproject.com/aspnet/wordapplication.asp

However a better method might be to dump Access and use MS-SQL if you can as that will let you store office documents inside it, which it will index for you and let you search against the contents of these. Effectivly this will do all the hard work for you. Not sure if MSDE / SQL Express also does this.

  • 0

hey man. thanks for the response.

ok, the api is a bit resource intensive, since it involves loading up an instance of the ms word app in the memory. since i want to index files quickly, and there will be plenty of file types to index, the overhead of opening up each ms office app will be too much. isnt there some dll or somethin that can be used to index text from office files?

secondly regarding ms sql, the problem is that this has to be a redistributable desktop application. with ms accesss, i can easily bundle an mdb file with empty tables in the project. but with ms sql, i cant even be sure if the client pcs will hv it or not.

thanks for all ur help

danish

  • 0

If there is a dll that lets you peek into the file then I haven't heard of it, all the things I've done with Word were through the API. I see your issue with SQL server, the other option would have been usign MSDE \ SQL Express which you could include in your app, but unfortuantly these don't inlcude full text searchs because of size reasons. The last I heard this wasn't being re-considered. :(

Perhaps your best option is to use the IFILTER interface (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixrefint_9sfm.asp) which is what I'm lead to believe the MSN Desktop search uses to look inside files. I can't offer much info on these as they are new to me as of ten mins ago, but I'd imagine there would be Word and other office interfaces already floating around somewhere which you could use.

Hope this helps...

  • 0

Very easy (copied from an app i made) its in vb.net but that shouldnt stop you. You need references to the word.interop dlls:

            'Cheap way of opening word docs, open as a doc, and save as a text file. Then open the text file!
            Dim wWordApp As Word.Application = New Word.Application
            wWordApp.DisplayAlerts = Word.WdAlertLevel.wdAlertsNone

            Dim dFile As Word.Document = wWordApp.Documents.Open(CType(sFilename, Object))

            dFile.SaveAs(Path.GetDirectoryName(Application.ExecutablePath) + "\temp.txt", Word.WdSaveFormat.wdFormatText)
            dFile.Close()


  • 0

hey man. thanks for tht tip. but u see it involves creating an object of the wordapp class, which takes up a lot of resources. imagine if im indexing files on the fly as they get modified by the user. then i might hv a scenario where the user is working on a word doc, and xls spreadsheet and a powerpoint presentation at the same time, making changes to all 3.

i wud be repeatedly required to create instances of wordapp, excelapp and powerpointapp over and over again, which wud just make the resource consumption ghastly :p

thanks for replyin tho. really appreicate it :)

  • 0

I'm not sure how well this would work, but... If you were to open a Word .doc in a plain text editor (eg Notepad), you'd see a lot of unrecognizable characters plus the actual characters of the text that was written in Word. Since it sounds as though ou are only interested in the text, and not any of the formating, you may try opening and reading the Word .doc as a text file in your program (see TextReader class). Because Word may have stored newlines differently, you may be stuck with using only Read() or ReadtoEnd() methods. Perhaps ReadToEnd() stored in a string, and apply a regular expression (RegEx class) to parse this to only include "real" characters (eg \w\s to match word characters (digits and alphabet) and white space characters (mostly " " spaces, since Word would probably have goofed-up tabs, newlines, etc.)). I'm not sure how well this would work for other Office documents, however....

  • 0

hey everybody. thanks a lot for all the responses.

ive finally managed to figure this one out. the answer lies in the use of IFilters, as gooey suggested.

after a lot of searching on the net and a bit of tweaking, ive ,made a C Sharp class that can extract the text frrom .doc, .xls and .ppt files. ill post the code shortly, but i must warn that this class has very primitive error checking, and although it hardly ever crahses, its not feasable for distribution i would think. if somebody ever improves on this, pls do post a version here, or mail it to me. thanks a lot.

  • 0

Hi!

I'm working in a similar project but for a document management repository. I also need to parse doc files, although I'm using the Lucene .net project for storing and retrieving indexes.

Can you please post the C Sharp class in its actual status?

Thanks in Advance,

Tiago

hey everybody. thanks a lot for all the responses.

ive finally managed to figure this one out. the answer lies in the use of IFilters, as gooey suggested.

after a lot of searching on the net and a bit of tweaking, ive ,made a C Sharp class that can extract the text frrom .doc, .xls and .ppt files. ill post the code shortly, but i must warn that this class has very primitive error checking, and although it hardly ever crahses, its not feasable for distribution i would think. if somebody ever improves on this, pls do post a version here, or mail it to me. thanks a lot.

585939867[/snapback]

  • 0

hey

i tried to post the code earlier on, but the newowin server kept giving me errors.sorry abt it. im tryin again now. hopefully it works this time.

Edited:

It works!!

ok this is how to use this:

add a new code file to ur project and just copy past all this code.

create a OfficeFileReader.OfficeFileReader object, can call the method GetText.

the syntax is as follows:

public static void Main()

{

  OfficeFileReader.OfficeFileReader objOFR = new OfficeFileReader.OfficeFileReader()

  string output="";

  objOFR.GetText("C:\\MyWordFile.Doc", ref output);

  Console.WriteLine(output);

}

///==============================================================

/// Office File Reader

///==============================================================

using System;

using System.Text;

using System.Runtime.InteropServices;

namespace OfficeFileReader

{

? ? #region Stuff you Dont even need to look at

? ? [Flags]

? ? public enum IFILTER_INIT

? ? {

? ? ? ? NONE = 0,

? ? ? ? CANON_PARAGRAPHS = 1,

? ? ? ? HARD_LINE_BREAKS = 2,

? ? ? ? CANON_HYPHENS = 4,

? ? ? ? CANON_SPACES = 8,

? ? ? ? APPLY_INDEX_ATTRIBUTES = 16,

? ? ? ? APPLY_CRAWL_ATTRIBUTES = 256,

? ? ? ? APPLY_OTHER_ATTRIBUTES = 32,

? ? ? ? INDEXING_ONLY = 64,

? ? ? ? SEARCH_LINKS = 128,

? ? ? ? FILTER_OWNED_VALUE_OK = 512

? ? }

? ? [Flags]

? ? public enum IFILTER_FLAGS

? ? {

? ? ? ? OLE_PROPERTIES = 1

? ? }

? ? public enum CHUNK_BREAKTYPE

? ? {

? ? ? ? CHUNK_NO_BREAK = 0,

? ? ? ? CHUNK_EOW = 1,

? ? ? ? CHUNK_EOS = 2,

? ? ? ? CHUNK_EOP = 3,

? ? ? ? CHUNK_EOC = 4

? ? }

? ? [Flags]

? ? public enum CHUNKSTATE

? ? {

? ? ? ? CHUNK_TEXT = 0x1,

? ? ? ? CHUNK_VALUE = 0x2,

? ? ? ? CHUNK_FILTER_OWNED_VALUE = 0x4

? ? }

? ? public enum PSKIND

? ? {

? ? ? ? LPWSTR = 0,

? ? ? ? PROPID = 1

? ? }

? ? [structLayout(LayoutKind.Sequential)]

? ? public struct PROPSPEC

? ? {

? ? ? ? public uint ulKind;

? ? ? ? public uint propid;

? ? ? ? public IntPtr lpwstr;

? ? }

? ? [structLayout(LayoutKind.Sequential)]

? ? public struct FULLPROPSPEC

? ? {

? ? ? ? public Guid guidPropSet;

? ? ? ? public PROPSPEC psProperty;

? ? }

? ? [structLayout(LayoutKind.Sequential)]

? ? public struct STAT_CHUNK

? ? {

? ? ? ? public uint idChunk;

? ? ? ? [MarshalAs(UnmanagedType.U4)]

? ? ? ? public CHUNK_BREAKTYPE breakType;

? ? ? ? [MarshalAs(UnmanagedType.U4)]

? ? ? ? public CHUNKSTATE flags;

? ? ? ? public uint locale;

? ? ? ? [MarshalAs(UnmanagedType.Struct)]

? ? ? ? public FULLPROPSPEC attribute;

? ? ? ? public uint idChunkSource;

? ? ? ? public uint cwcStartSource;

? ? ? ? public uint cwcLenSource;

? ? }

? ? [structLayout(LayoutKind.Sequential)]

? ? public struct FILTERREGION

? ? {

? ? ? ? public uint idChunk;

? ? ? ? public uint cwcStart;

? ? ? ? public uint cwcExtent;

? ? }

? ? #endregion

? ? [ComImport]

? ? [Guid("89BCB740-6119-101A-BCB7-00DD010655AF")]

? ? [interfaceType(ComInterfaceType.InterfaceIsIUnknown)]

? ? public interface IFilter

? ? {

? ? ? ? void Init([MarshalAs(UnmanagedType.U4)] IFILTER_INIT grfFlags,

? ? ? ? ? ? ? ? ? uint cAttributes,

? ? ? ? ? ? ? ? ? [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)] FULLPROPSPEC[] aAttributes,

? ? ? ? ? ? ? ? ? ref uint pdwFlags);

? ? ? ? void GetChunk([MarshalAs(UnmanagedType.Struct)] out STAT_CHUNK pStat);

? ? ? ? [PreserveSig]

? ? ? ? int GetText(ref uint pcwcBuffer, [MarshalAs(UnmanagedType.LPWStr)] StringBuilder buffer);

? ? ? ? void GetValue(ref UIntPtr ppPropValue);

? ? ? ? void BindRegion([MarshalAs(UnmanagedType.Struct)]FILTERREGION origPos, ref Guid riid, ref UIntPtr ppunk);

? ? }

? ? [ComImport]

? ? [Guid("f07f3920-7b8c-11cf-9be8-00aa004b9986")]

? ? public class CFilter

? ? {

? ? }

? ? public class Constants

? ? {

? ? ? ? public const uint PID_STG_DIRECTORY = 0x00000002;

? ? ? ? public const uint PID_STG_CLASSID = 0x00000003;

? ? ? ? public const uint PID_STG_STORAGETYPE = 0x00000004;

? ? ? ? public const uint PID_STG_VOLUME_ID = 0x00000005;

? ? ? ? public const uint PID_STG_PARENT_WORKID = 0x00000006;

? ? ? ? public const uint PID_STG_SECONDARYSTORE = 0x00000007;

? ? ? ? public const uint PID_STG_FILEINDEX = 0x00000008;

? ? ? ? public const uint PID_STG_LASTCHANGEUSN = 0x00000009;

? ? ? ? public const uint PID_STG_NAME = 0x0000000a;

? ? ? ? public const uint PID_STG_PATH = 0x0000000b;

? ? ? ? public const uint PID_STG_SIZE = 0x0000000c;

? ? ? ? public const uint PID_STG_ATTRIBUTES = 0x0000000d;

? ? ? ? public const uint PID_STG_WRITETIME = 0x0000000e;

? ? ? ? public const uint PID_STG_CREATETIME = 0x0000000f;

? ? ? ? public const uint PID_STG_ACCESSTIME = 0x00000010;

? ? ? ? public const uint PID_STG_CHANGETIME = 0x00000011;

? ? ? ? public const uint PID_STG_CONTENTS = 0x00000013;

? ? ? ? public const uint PID_STG_SHORTNAME = 0x00000014;

? ? ? ? public const int FILTER_E_END_OF_CHUNKS = (unchecked((int)0x80041700));

? ? ? ? public const int FILTER_E_NO_MORE_TEXT = (unchecked((int)0x80041701));

? ? ? ? public const int FILTER_E_NO_MORE_VALUES = (unchecked((int)0x80041702));

? ? ? ? public const int FILTER_E_NO_TEXT = (unchecked((int)0x80041705));

? ? ? ? public const int FILTER_E_NO_VALUES = (unchecked((int)0x80041706));

? ? ? ? public const int FILTER_S_LAST_TEXT = (unchecked((int)0x00041709));

? ? ? ?

? ? }

? ? public class OfficeFileReader

? ? {?

? ? ? ? public void GetText(String path,ref string text)

? ? ? ? ? ? // path is the path of the .doc, .xls or .ppt? file

? ? ? ? ? ? // text is the variable in which all the extracted text will be stored

? ? ? ? {

? ? ? ? ? ? String result = "";

? ? ? ? ? ? int count = 0;

? ? ? ? ? ? try

? ? ? ? ? ? {

? ? ? ? ? ? ? ? IFilter ifilt = (IFilter)(new CFilter());

? ? ? ? ? ? ? ? //System.Runtime.InteropServices.UCOMIPersistFile ipf = (System.Runtime.InteropServices.UCOMIPersistFile)(ifilt);

? ? ? ? ? ? ? ? System.Runtime.InteropServices.ComTypes.IPersistFile ipf= (System.Runtime.InteropServices.ComTypes.IPersistFile)(ifilt);

? ? ? ? ? ? ? ? ipf.Load(@path, 0);

? ? ? ? ? ? ? ? uint i = 0;

? ? ? ? ? ? ? ? STAT_CHUNK ps = new STAT_CHUNK();

? ? ? ? ? ? ? ? ifilt.Init(IFILTER_INIT.NONE, 0, null, ref i);

? ? ? ? ? ? ? ? int hr = 0;

? ? ? ? ? ? ? ?

? ? ? ? ? ? ? ? while (hr == 0)

? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ?

? ? ? ? ? ? ? ? ? ? ? ? ifilt.GetChunk(out ps);

? ? ? ? ? ? ? ? ? ? ? ? if (ps.flags == CHUNKSTATE.CHUNK_TEXT)

? ? ? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? ? ? uint pcwcBuffer = 1000;

? ? ? ? ? ? ? ? ? ? ? ? ? ? int hr2 = 0;

? ? ? ? ? ? ? ? ? ? ? ? ? ? while (hr2 == Constants.FILTER_S_LAST_TEXT || hr2 == 0)

? ? ? ? ? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? try

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? pcwcBuffer = 1000;

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? System.Text.StringBuilder sbBuffer = new StringBuilder((int)pcwcBuffer);

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hr2 = ifilt.GetText(ref pcwcBuffer, sbBuffer);

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? // Console.WriteLine(pcwcBuffer.ToString());

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? if (hr2 >= 0) result += sbBuffer.ToString(0, (int)pcwcBuffer);

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? //textBox1.Text +="\n";

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? // result += "#########################################";

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? count++;

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? catch (System.Runtime.InteropServices.COMException myE)

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Console.WriteLine(myE.Data + "\n" + myE.Message + "\n");

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? ? ?

? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ?

? ? ? ? ? ? }

? ? ? ? ? ? catch (System.Runtime.InteropServices.COMException myE)

? ? ? ? ? ? {

? ? ? ? ? ? ? ? Console.WriteLine(myE.Data + "\n" + myE.Message + "\n");

? ? ? ? ? ? }

? ? ? ? ? ? text = result;

? ? ? ? ? ? //return count;

?  return;

? ? ? ? }

? ? }

}

Edited by dtmunir
  • 0

Hi I get these errors while trying to make a class library, any ideas ???

Preparing resources...

Updating references...

Performing main compilation...

e:\documents and settings\upake\my documents\visual studio projects\classlibrary2\officefilereader.cs(308,36): error CS0234: The type or namespace name 'ComTypes' does not exist in the class or namespace 'System.Runtime.InteropServices' (are you missing an assembly reference?)

e:\documents and settings\upake\my documents\visual studio projects\classlibrary2\officefilereader.cs(309,5): error CS0246: The type or namespace name 'ipf' could not be found (are you missing a using directive or an assembly reference?)

e:\documents and settings\upake\my documents\visual studio projects\classlibrary2\officefilereader.cs(338,27): error CS0117: 'System.Runtime.InteropServices.COMException' does not contain a definition for 'Data'

e:\documents and settings\upake\my documents\visual studio projects\classlibrary2\officefilereader.cs(349,23): error CS0117: 'System.Runtime.InteropServices.COMException' does not contain a definition for 'Data'

Please help

  • 0

ok, i dont hv VS 2003, or .Net 1.1.

I built this on VS 2005 and .Net 2.0 Beta

but i dont think that should be a problem, b/c the code isnt mine, and the site i took it from didnt build it on .Net 2.0

since i cant figure out wat the problem is, if u want, i could compile this code into a dll, so that you could use.

or if any one else has managed to make this code work on VS 2003, pls share....

  • 0

Hi,

That will be great, can you mail the dll to [email protected].

Thanks man

Upake

ok, i dont hv VS 2003, or .Net 1.1.

I built this on VS 2005 and .Net 2.0 Beta

but i dont think that should be a problem, b/c the code isnt mine, and the site i took it from didnt build it on .Net 2.0

since i cant figure out wat the problem is, if u want, i could compile this code into a dll, so that you could use.

or if any one else has managed to make this code work on VS 2003, pls share....

586126041[/snapback]

  • 0

Thanks a lot, this is exactly what I was looking for.

I'm also using VS .NET 2003 and I got the same compiler errors but managed to fix it - I don't know yet what I'm doing, hehe.

I can now read excel, word and powerpoint files. I tried installing the adobe pdf ifilter, but I still can't read pdf's. Does anyone have an idea how to get pdfs to work?

Change the comments=

 //System.Runtime.InteropServices.UCOMIPersistFile ipf = (System.Runtime.InteropServices.UCOMIPersistFile)(ifilt);
System.Runtime.InteropServices.ComTypes.IPersistFile ipf= (System.Runtime.InteropServices.ComTypes.IPersistFile)(ifilt);

to:

System.Runtime.InteropServices.UCOMIPersistFile ipf = (System.Runtime.InteropServices.UCOMIPersistFile)(ifilt);
//System.Runtime.InteropServices.ComTypes.IPersistFile ipf= (System.Runtime.InteropServices.ComTypes.IPersistFile)(ifilt);

  • 0

u need to change the GUID value of the GUID Attribute at the CFilter. The class is as below

[ComImport()]

[Guid("4C904448-74A9-11d0-AF6E-00C04FD8DC02")]

public class CFilter

{

}

and now u can read pdf files as text.

I have some bit lying around in my home pc which returns iFilter according to the file (ofcourse if the ifilter is installed for that file type). As soon as i reach home. I will post the code.

Umer

  • 0

Just to add: I also had to remove the myE.Data to get it to work, seems to be another 2005 thing?

Anyway, I wonder if there is some way to know if the last chunk has been read? it seems to just try reading until an exception is thrown? The variable hr might have been intended for this, but it is never changed in the code - any idea what hr could stand for?

  • 0

I guess I found the solution to the problem myself.

I never tried any COM interop before this, so quite a bit of the "magic" is a bit confusing. But If I change

void GetChunk([MarshalAs(UnmanagedType.Struct)] out STAT_CHUNK pStat);

to

[PreserveSig]
int GetChunk([MarshalAs(UnmanagedType.Struct)] out STAT_CHUNK pStat);

It doesn't throw exceptions any more but instead returns error values like the original method.

Sorry i'm new to this forum, it might be trivial but it caused be a great deal of headache :)

  • 0

Use the below class together with the class posted by dtmunir before and use this below sample to parse any kind of file whose iFilter is installed on the machine

I got this code sample from CodeProject. There is a project by the name of Office Desktop Search. I found the below very useful so i thought i should post it here.

And it doesnt matter to my boss using Internet at the workplace danish bhai. They dont even know i have something known as USB Flash Drive (256MB) in my pocket everytime;) but i try to be as sincere as possible!!! I hv never copied any office file on the usb :shifty: :rolleyes:

Looks like as if we are

Umer

// Sample for using the below class
public string ParseFile(string filename)
{
  if (Parser.IsParseable(filename) == true)
  {
    return Parser.Parse(path)
  }
  else
  {
   return "" // or throw an exception or whatever
  }
}

// Add all this code to another file and place it with the class posted by the dtmunir
using System;
using System.Runtime.InteropServices;
using System.Text;

namespace OfficeFileReader
{
	/// <summary>
	/// Summary description for Parser.
	/// </summary>
	public class Parser
	{
  public Parser()
  {
  }

  [DllImport("query.dll", CharSet = CharSet.Unicode)] 
  private extern static int LoadIFilter (string pwcsPath, ref IUnknown pUnkOuter, ref IFilter ppIUnk); 

  [ComImport, Guid("00000000-0000-0000-C000-000000000046")] 
  [InterfaceType(ComInterfaceType.InterfaceIsIUnknown)] 
  private interface IUnknown 
  { 
  	[PreserveSig] 
  	IntPtr QueryInterface( ref Guid riid, out IntPtr pVoid ); 

  	[PreserveSig] 
  	IntPtr AddRef(); 

  	[PreserveSig] 
  	IntPtr Release(); 
  } 


  private static IFilter loadIFilter(string filename)
  {
  	IUnknown iunk = null; 
  	IFilter filter = null;

  	// Try to load the corresponding IFilter 
  	int resultLoad = LoadIFilter( filename, ref iunk, ref filter ); 
  	if (resultLoad != (int)IFilterReturnCodes.S_OK) 
  	{ 
    return null;
  	} 
  	return filter;
  }

/*
  private static IFilter loadIFilterOffice(string filename)
  {
  	IFilter filter = (IFilter)(new CFilter());
  	System.Runtime.InteropServices.UCOMIPersistFile ipf = (System.Runtime.InteropServices.UCOMIPersistFile)(filter);
  	ipf.Load(filename, 0);

  	return filter;
  }
*/

  public static bool IsParseable(string filename)
  {
  	return loadIFilter(filename) != null;
  }

  public static string Parse(string filename)
  {
  	IFilter filter = null;

  	try 
  	{
    StringBuilder plainTextResult = new StringBuilder();
    filter = loadIFilter(filename); 

    STAT_CHUNK ps = new STAT_CHUNK();
    IFILTER_INIT mFlags = 0;

    uint i = 0;
    filter.Init( mFlags, 0, null, ref i);

    int resultChunk = 0;

    resultChunk = filter.GetChunk(out ps);
    while (resultChunk == 0)
    {
    	if (ps.flags == CHUNKSTATE.CHUNK_TEXT)
    	{
      uint sizeBuffer = 60000;
      int resultText = 0;
      while (resultText == Constants.FILTER_S_LAST_TEXT || resultText == 0)
      {
      	sizeBuffer = 60000;
      	System.Text.StringBuilder sbBuffer = new System.Text.StringBuilder((int)sizeBuffer);
      	resultText = filter.GetText(ref sizeBuffer, sbBuffer);

      	if (sizeBuffer > 0 && sbBuffer.Length > 0)
      	{
        string chunk = sbBuffer.ToString(0, (int)sizeBuffer);
        plainTextResult.Append(chunk);
      	}
      }
    	}
    	resultChunk = filter.GetChunk(out ps);
    }
    return plainTextResult.ToString();
  	}
  	finally
  	{
    if (filter != null)
    	Marshal.ReleaseComObject(filter);
  	}
  }
	}
}

  • 0

This works too, but in all the approaches I have tried so far, I always get an application error, but only with pdf files:

(ReadFile.exe is the name of my assembly)

Font Capture: ReadFile.exe - Application Error

The instruction at "0x030a61b3" referenced memory at "0x03a823e8". The memory could not be "read"

This always happens when my program closes - it works perefctly fine until I exit Main()...

I wonder if this has something to do with the Adobe IFilter not being released properly?

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Microsoft further improving Windows 11 Taskbar with latest builds by Sayan Sen Microsoft has released new Windows 11 builds for users flighting the Experimental channels. The new builds are 26300.8758 for Windows 11 26H2, 28120.2374 for 26H1, and 29617.1000 for future platforms. There are improvements related to the Taskbar, File Explorer and more with the new update. The full changelogs are given below: First we have the build 26300.8758: Changes and improvements gradually being rolled out [Taskbar] Taskbar customization just got easier. As we continue to make improvements to the Taskbar experience mentioned last month, we've introduced a dedicated Taskbar Size setting, making it simpler to find, understand, and personalize your ideal taskbar experience. UI showing the new Taskbar Size setting in Settings. We've also made refinements to the transitions between taskbar sizes for a smoother overall experience. [File Explorer] We've improved the reliability of thumbnail previews for cloud files in the Details pane. The pane has also been reorganized so file properties are easier to find and review at a glance. Fixed an issue where the OneDrive shortcut in File Explorer stops working when File Explorer is run in administrative mode. Fixed an issue where the confirmation dialog might display an internal Recycle Bin file name instead of the original file name when permanently deleting a file. [Sounds] Improved system sounds when using Windows in dark mode. Up next we have build 28120.2374: Changes and improvements gradually being rolled out This update includes a small set of general improvements and fixes [Mobile Device Settings] You can add and manage your mobile devices in Settings under Bluetooth & Devices > Mobile Devices. On this page, you can manage features such as using your device as a connected camera or accessing your device's files in File Explorer. [Remote Recovery Management] Added a recovery remote management plug-in to extend WinRE management capabilities for MDM providers. [Input] The emoji panel (Windows key + period (.)) now uses GIPHY as the GIF provider, delivering a smoother GIF browsing and sharing experience following the deprecation of the Tenor API. Finally we have the changelog for Windows 11 build 29617.1000: Changes and improvements gradually being rolled out [Windows Update] As announced in the Windows Update announce blog, we are now bringing a new unified update experience to reduce the number of reboots you see per month. We are starting by coordinating driver, .NET, and firmware updates to align with the monthly quality update, reducing the update experience to a single monthly restart. See the blog for more information. [Windows Magnifier] Magnifier now gives you more control over how you zoom. You can type an exact zoom percentage directly in the magnifier toolbar to land on precisely the level you need. We've also added preset step increments (5%, 10%, 25%, 50%, 100%, 150%, 200%, and 400%) to the Settings dropdown, so you can jump to common levels in a single click. Whether you need a subtle boost or a dramatic close-up, Magnifier adapts to how you want to zoom. Enter an exact percentage or jump to preset steps —5% up to 400%. Feedback: Share your thoughts in Feedback Hub (WIN + F) under Accessibility > Magnifier. [Accessibility] We're introducing screen tint, a new accessibility setting that applies a color overlay across your entire display, softening its intensity so it's easier on your eyes throughout the day. If bright, saturated screens leave you with tired or sensitive eyes by the end of a long session, screen tint can help. Screenshot showing UI for screen tint in Accessibility, with color presets and a strength slider. To get started, open Settings > Accessibility (or press WIN + U) and look for screen tint under the Vision section. From there, you can: Pick from six preset colors or choose a custom color of your own. Adjust the tint strength slider from a subtle wash to full intensity. Night light warms your display to reduce blue light that can interfere with sleep. Screen tint reduces overall screen intensity to ease eye fatigue and light sensitivity during the day. They tackle different problems and you can use both at the same time, one working on warmth and the other on intensity. Note that turning on screen tint will disable color filters, and vice versa. If you currently rely on color filters, you might need to keep screen tint turned off. Feedback: Share your thoughts in Feedback Hub (WIN + F) under Accessibility > Narrator. [Voice Access] Voice Access now supports Portuguese (Portugal), Portuguese (Brazil), and Korean (South Korea). [Audio] Continuing our work on improving Sound Settings, we've made a few more updates in this build: We've adjusted the description text for the Allow option in properties for audio devices to include the current state of the device, to improve the clarity of the text and the purpose of the button actions. "Listen to this device" is now available in properties for audio devices, so you don't need to enter Control Panel for this functionality. [Multiple Desktops] Improved explorer reliability when switching between multiple desktops. [Storage] We've updated the dialog when creating a Dev Drive to now support specifying the size in GB instead of only MB. This has also been added when changing the size of volumes under Settings > System > Storage. [Personalization] This update improves color selection accuracy when adjusting your accent color to match your wallpaper when automatic accent color selection is enabled in Personalization settings. This update improves wallpaper persistence reliability across restarts and upgrades, including better support for large-resolution wallpapers and other scenarios to prevent solid color wallpaper fallback. [Display and Graphics] Improves the reliability and persistence of applying color profiles. You can view the official blog posts here (link1, link2, link3) on Microsoft's site.
    • Windows 11 is getting redesigned taskbar settings in new build by Taras Buria Microsoft is rolling out new Windows 11 preview builds in the Insider program, offering users new features and changes to try ahead of public release. In the Experimental channel (formerly Dev), Microsoft is shipping build 26300.8758, while in the Beta channel, users can download build 26220.8754. The changelogs do not contain much, but there is an important update to taskbar settings. Here is what is new in build 26220.8754: [Taskbar] Taskbar customization just got easier. As we continue to make improvements to the Taskbar experience mentioned last month, we've introduced a dedicated Taskbar Size setting, making it simpler to find, understand, and personalize your ideal taskbar experience. We've also made refinements to the transitions between taskbar sizes for a smoother overall experience. [File Explorer] We've improved the reliability of thumbnail previews for cloud files in the Details pane. The pane has also been reorganized so file properties are easier to find and review at a glance. Fixed an issue where the OneDrive shortcut in File Explorer stops working when File Explorer is run in administrative mode. Fixed an issue where the confirmation dialog might display an internal Recycle Bin file name instead of the original file name when permanently deleting a file. [Sounds] Improved system sounds when using Windows in dark mode. And here is what is new in build 26220.8754: [Smart card removal policy] Administrators can now configure Azure Virtual Desktop (AVD) and Windows 365 sessions that use Microsoft Entra ID (RDS AAD Auth) authentication to automatically disconnect when a redirected smart card is removed. This extends smart card removal policy enforcement to Microsoft Entra authenticated remote sessions, helping organizations meet security and compliance requirements. [File Explorer] Fixed an issue where the OneDrive shortcut in File Explorer stops working when File Explorer is run in administrator mode. [Taskbar] Improved reliability of loading the system tray area of the taskbar. [Sounds] Improved system sounds when using Windows in dark mode. You can find release notes for build 26300.8758 here and for build 26220.8754 here.
    • Correct. Thank you unfortunately commenting on this stupid article we bring a possible more crap like that. If it gets click they post it
    • Firefox 152.0.3 by Razvan Serea Firefox is a fast, full-featured Web browser. It offers great security, privacy, and protection against viruses, spyware, malware, and it can also easily block pop-up windows. The key features that have made Firefox so popular are the simple and effective UI, browser speed and strong security capabilities. Firefox has complete features for browsing the Internet. It is very reliable and flexible due to its implemented security features, along with customization options. Firefox includes pop-up blocking, tab-browsing, integrated Google search, simplified privacy controls, a streamlined browser window that shows you more of the page than any other browser and a number of additional features that work with you to help you get the most out of your time online. Firefox key features Enhanced Tracking Protection (ETP) – Blocks trackers, cookies, cryptominers, and fingerprinters by default. Private Browsing Mode – Deletes history, cookies, and temporary files when closed. Lightweight & Fast Performance – Optimized memory usage with efficient page loading. Cross-Platform Sync – Sync bookmarks, passwords, history, and open tabs across devices. Customizable Interface – Toolbars, themes, and extensions can be tailored to user needs. Strong Privacy Controls – Options to manage cookies, permissions, and site data easily. Reader Mode – Strips away clutter for distraction-free reading. Pocket Integration – Save and read articles offline with Pocket built into Firefox. Picture-in-Picture (PiP) – Watch videos in a floating window while multitasking. Extensions & Add-ons – Vast library for productivity, security, and personalization. Built-in PDF Viewer – No need for external software to view PDFs. Firefox Monitor – Alerts users if their email is part of a known data breach. Multi-Account Containers – Isolate browsing sessions (e.g., work, personal, shopping). Performance & Resource Efficiency – Uses fewer system resources than some competitors. Open Source & Community-Driven – Transparent development with global contributions. Firefox 152.0.3 fixes: Fixed an issue that could cause extreme memory usage and freezing on startup for users with language packs installed. (Bug 2049845) Download: Firefox 64-bit | Firefox 32-bit | ARM64 | ~70.0 MB (Freeware) Download: Firefox for MacOS | 146.0 MB View: Firefox Home Page | Release Notes Get alerted to all of our Software updates on Twitter at @NeowinSoftware
  • Recent Achievements

    • Week One Done
      Scoobystu earned a badge
      Week One Done
    • Week One Done
      tuben earned a badge
      Week One Done
    • First Post
      OffsetAbs earned a badge
      First Post
    • Reacting Well
      OffsetAbs earned a badge
      Reacting Well
    • First Post
      Kolakid60 earned a badge
      First Post
  • Popular Contributors

    1. 1
      +primortal
      438
    2. 2
      +Edouard
      197
    3. 3
      PsYcHoKiLLa
      156
    4. 4
      FloatingFatMan
      71
    5. 5
      Steven P.
      68
  • Tell a friend

    Love Neowin? Tell a friend!