• 0

[C#] Reading text from MS Word files


Question

hey guys. how can i read the text from an MS word, and possibly other Ms Office files while using the least possible resources. If someone could share some code they already have for this, it wud be really sweet :p

basically what im trying to do is build a desktop searching application like MSNs googles and the others in C# for a class project. but mine doesnt have to be as complex or as feature rich. just a basic version of what they do.

any other tips wud also be appreciated.

thanks

danish

ps: im storing the data im indexing in an MS Access file. seems inefficient to me. any better way to do that?

Link to comment
https://www.neowin.net/forum/topic/316480-c-reading-text-from-ms-word-files/
Share on other sites

Recommended Posts

  • 0

What you'll probably have to do is add a COM reference to the Microsoft Word Object library and access word that way, but the word API isn't the friendliest thing in the world. this link should help though....

http://www.codeproject.com/aspnet/wordapplication.asp

However a better method might be to dump Access and use MS-SQL if you can as that will let you store office documents inside it, which it will index for you and let you search against the contents of these. Effectivly this will do all the hard work for you. Not sure if MSDE / SQL Express also does this.

  • 0

hey man. thanks for the response.

ok, the api is a bit resource intensive, since it involves loading up an instance of the ms word app in the memory. since i want to index files quickly, and there will be plenty of file types to index, the overhead of opening up each ms office app will be too much. isnt there some dll or somethin that can be used to index text from office files?

secondly regarding ms sql, the problem is that this has to be a redistributable desktop application. with ms accesss, i can easily bundle an mdb file with empty tables in the project. but with ms sql, i cant even be sure if the client pcs will hv it or not.

thanks for all ur help

danish

  • 0

If there is a dll that lets you peek into the file then I haven't heard of it, all the things I've done with Word were through the API. I see your issue with SQL server, the other option would have been usign MSDE \ SQL Express which you could include in your app, but unfortuantly these don't inlcude full text searchs because of size reasons. The last I heard this wasn't being re-considered. :(

Perhaps your best option is to use the IFILTER interface (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixrefint_9sfm.asp) which is what I'm lead to believe the MSN Desktop search uses to look inside files. I can't offer much info on these as they are new to me as of ten mins ago, but I'd imagine there would be Word and other office interfaces already floating around somewhere which you could use.

Hope this helps...

  • 0

Very easy (copied from an app i made) its in vb.net but that shouldnt stop you. You need references to the word.interop dlls:

            'Cheap way of opening word docs, open as a doc, and save as a text file. Then open the text file!
            Dim wWordApp As Word.Application = New Word.Application
            wWordApp.DisplayAlerts = Word.WdAlertLevel.wdAlertsNone

            Dim dFile As Word.Document = wWordApp.Documents.Open(CType(sFilename, Object))

            dFile.SaveAs(Path.GetDirectoryName(Application.ExecutablePath) + "\temp.txt", Word.WdSaveFormat.wdFormatText)
            dFile.Close()


  • 0

hey man. thanks for tht tip. but u see it involves creating an object of the wordapp class, which takes up a lot of resources. imagine if im indexing files on the fly as they get modified by the user. then i might hv a scenario where the user is working on a word doc, and xls spreadsheet and a powerpoint presentation at the same time, making changes to all 3.

i wud be repeatedly required to create instances of wordapp, excelapp and powerpointapp over and over again, which wud just make the resource consumption ghastly :p

thanks for replyin tho. really appreicate it :)

  • 0

I'm not sure how well this would work, but... If you were to open a Word .doc in a plain text editor (eg Notepad), you'd see a lot of unrecognizable characters plus the actual characters of the text that was written in Word. Since it sounds as though ou are only interested in the text, and not any of the formating, you may try opening and reading the Word .doc as a text file in your program (see TextReader class). Because Word may have stored newlines differently, you may be stuck with using only Read() or ReadtoEnd() methods. Perhaps ReadToEnd() stored in a string, and apply a regular expression (RegEx class) to parse this to only include "real" characters (eg \w\s to match word characters (digits and alphabet) and white space characters (mostly " " spaces, since Word would probably have goofed-up tabs, newlines, etc.)). I'm not sure how well this would work for other Office documents, however....

  • 0

hey everybody. thanks a lot for all the responses.

ive finally managed to figure this one out. the answer lies in the use of IFilters, as gooey suggested.

after a lot of searching on the net and a bit of tweaking, ive ,made a C Sharp class that can extract the text frrom .doc, .xls and .ppt files. ill post the code shortly, but i must warn that this class has very primitive error checking, and although it hardly ever crahses, its not feasable for distribution i would think. if somebody ever improves on this, pls do post a version here, or mail it to me. thanks a lot.

  • 0

Hi!

I'm working in a similar project but for a document management repository. I also need to parse doc files, although I'm using the Lucene .net project for storing and retrieving indexes.

Can you please post the C Sharp class in its actual status?

Thanks in Advance,

Tiago

  dtmunir said:
hey everybody. thanks a lot for all the responses.

ive finally managed to figure this one out. the answer lies in the use of IFilters, as gooey suggested.

after a lot of searching on the net and a bit of tweaking, ive ,made a C Sharp class that can extract the text frrom .doc, .xls and .ppt files. ill post the code shortly, but i must warn that this class has very primitive error checking, and although it hardly ever crahses, its not feasable for distribution i would think. if somebody ever improves on this, pls do post a version here, or mail it to me. thanks a lot.

585939867[/snapback]

  • 0

hey

i tried to post the code earlier on, but the newowin server kept giving me errors.sorry abt it. im tryin again now. hopefully it works this time.

Edited:

It works!!

ok this is how to use this:

add a new code file to ur project and just copy past all this code.

create a OfficeFileReader.OfficeFileReader object, can call the method GetText.

the syntax is as follows:

  Quote
public static void Main()

{

  OfficeFileReader.OfficeFileReader objOFR = new OfficeFileReader.OfficeFileReader()

  string output="";

  objOFR.GetText("C:\\MyWordFile.Doc", ref output);

  Console.WriteLine(output);

}

  Quote

///==============================================================

/// Office File Reader

///==============================================================

using System;

using System.Text;

using System.Runtime.InteropServices;

namespace OfficeFileReader

{

? ? #region Stuff you Dont even need to look at

? ? [Flags]

? ? public enum IFILTER_INIT

? ? {

? ? ? ? NONE = 0,

? ? ? ? CANON_PARAGRAPHS = 1,

? ? ? ? HARD_LINE_BREAKS = 2,

? ? ? ? CANON_HYPHENS = 4,

? ? ? ? CANON_SPACES = 8,

? ? ? ? APPLY_INDEX_ATTRIBUTES = 16,

? ? ? ? APPLY_CRAWL_ATTRIBUTES = 256,

? ? ? ? APPLY_OTHER_ATTRIBUTES = 32,

? ? ? ? INDEXING_ONLY = 64,

? ? ? ? SEARCH_LINKS = 128,

? ? ? ? FILTER_OWNED_VALUE_OK = 512

? ? }

? ? [Flags]

? ? public enum IFILTER_FLAGS

? ? {

? ? ? ? OLE_PROPERTIES = 1

? ? }

? ? public enum CHUNK_BREAKTYPE

? ? {

? ? ? ? CHUNK_NO_BREAK = 0,

? ? ? ? CHUNK_EOW = 1,

? ? ? ? CHUNK_EOS = 2,

? ? ? ? CHUNK_EOP = 3,

? ? ? ? CHUNK_EOC = 4

? ? }

? ? [Flags]

? ? public enum CHUNKSTATE

? ? {

? ? ? ? CHUNK_TEXT = 0x1,

? ? ? ? CHUNK_VALUE = 0x2,

? ? ? ? CHUNK_FILTER_OWNED_VALUE = 0x4

? ? }

? ? public enum PSKIND

? ? {

? ? ? ? LPWSTR = 0,

? ? ? ? PROPID = 1

? ? }

? ? [structLayout(LayoutKind.Sequential)]

? ? public struct PROPSPEC

? ? {

? ? ? ? public uint ulKind;

? ? ? ? public uint propid;

? ? ? ? public IntPtr lpwstr;

? ? }

? ? [structLayout(LayoutKind.Sequential)]

? ? public struct FULLPROPSPEC

? ? {

? ? ? ? public Guid guidPropSet;

? ? ? ? public PROPSPEC psProperty;

? ? }

? ? [structLayout(LayoutKind.Sequential)]

? ? public struct STAT_CHUNK

? ? {

? ? ? ? public uint idChunk;

? ? ? ? [MarshalAs(UnmanagedType.U4)]

? ? ? ? public CHUNK_BREAKTYPE breakType;

? ? ? ? [MarshalAs(UnmanagedType.U4)]

? ? ? ? public CHUNKSTATE flags;

? ? ? ? public uint locale;

? ? ? ? [MarshalAs(UnmanagedType.Struct)]

? ? ? ? public FULLPROPSPEC attribute;

? ? ? ? public uint idChunkSource;

? ? ? ? public uint cwcStartSource;

? ? ? ? public uint cwcLenSource;

? ? }

? ? [structLayout(LayoutKind.Sequential)]

? ? public struct FILTERREGION

? ? {

? ? ? ? public uint idChunk;

? ? ? ? public uint cwcStart;

? ? ? ? public uint cwcExtent;

? ? }

? ? #endregion

? ? [ComImport]

? ? [Guid("89BCB740-6119-101A-BCB7-00DD010655AF")]

? ? [interfaceType(ComInterfaceType.InterfaceIsIUnknown)]

? ? public interface IFilter

? ? {

? ? ? ? void Init([MarshalAs(UnmanagedType.U4)] IFILTER_INIT grfFlags,

? ? ? ? ? ? ? ? ? uint cAttributes,

? ? ? ? ? ? ? ? ? [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)] FULLPROPSPEC[] aAttributes,

? ? ? ? ? ? ? ? ? ref uint pdwFlags);

? ? ? ? void GetChunk([MarshalAs(UnmanagedType.Struct)] out STAT_CHUNK pStat);

? ? ? ? [PreserveSig]

? ? ? ? int GetText(ref uint pcwcBuffer, [MarshalAs(UnmanagedType.LPWStr)] StringBuilder buffer);

? ? ? ? void GetValue(ref UIntPtr ppPropValue);

? ? ? ? void BindRegion([MarshalAs(UnmanagedType.Struct)]FILTERREGION origPos, ref Guid riid, ref UIntPtr ppunk);

? ? }

? ? [ComImport]

? ? [Guid("f07f3920-7b8c-11cf-9be8-00aa004b9986")]

? ? public class CFilter

? ? {

? ? }

? ? public class Constants

? ? {

? ? ? ? public const uint PID_STG_DIRECTORY = 0x00000002;

? ? ? ? public const uint PID_STG_CLASSID = 0x00000003;

? ? ? ? public const uint PID_STG_STORAGETYPE = 0x00000004;

? ? ? ? public const uint PID_STG_VOLUME_ID = 0x00000005;

? ? ? ? public const uint PID_STG_PARENT_WORKID = 0x00000006;

? ? ? ? public const uint PID_STG_SECONDARYSTORE = 0x00000007;

? ? ? ? public const uint PID_STG_FILEINDEX = 0x00000008;

? ? ? ? public const uint PID_STG_LASTCHANGEUSN = 0x00000009;

? ? ? ? public const uint PID_STG_NAME = 0x0000000a;

? ? ? ? public const uint PID_STG_PATH = 0x0000000b;

? ? ? ? public const uint PID_STG_SIZE = 0x0000000c;

? ? ? ? public const uint PID_STG_ATTRIBUTES = 0x0000000d;

? ? ? ? public const uint PID_STG_WRITETIME = 0x0000000e;

? ? ? ? public const uint PID_STG_CREATETIME = 0x0000000f;

? ? ? ? public const uint PID_STG_ACCESSTIME = 0x00000010;

? ? ? ? public const uint PID_STG_CHANGETIME = 0x00000011;

? ? ? ? public const uint PID_STG_CONTENTS = 0x00000013;

? ? ? ? public const uint PID_STG_SHORTNAME = 0x00000014;

? ? ? ? public const int FILTER_E_END_OF_CHUNKS = (unchecked((int)0x80041700));

? ? ? ? public const int FILTER_E_NO_MORE_TEXT = (unchecked((int)0x80041701));

? ? ? ? public const int FILTER_E_NO_MORE_VALUES = (unchecked((int)0x80041702));

? ? ? ? public const int FILTER_E_NO_TEXT = (unchecked((int)0x80041705));

? ? ? ? public const int FILTER_E_NO_VALUES = (unchecked((int)0x80041706));

? ? ? ? public const int FILTER_S_LAST_TEXT = (unchecked((int)0x00041709));

? ? ? ?

? ? }

? ? public class OfficeFileReader

? ? {?

? ? ? ? public void GetText(String path,ref string text)

? ? ? ? ? ? // path is the path of the .doc, .xls or .ppt? file

? ? ? ? ? ? // text is the variable in which all the extracted text will be stored

? ? ? ? {

? ? ? ? ? ? String result = "";

? ? ? ? ? ? int count = 0;

? ? ? ? ? ? try

? ? ? ? ? ? {

? ? ? ? ? ? ? ? IFilter ifilt = (IFilter)(new CFilter());

? ? ? ? ? ? ? ? //System.Runtime.InteropServices.UCOMIPersistFile ipf = (System.Runtime.InteropServices.UCOMIPersistFile)(ifilt);

? ? ? ? ? ? ? ? System.Runtime.InteropServices.ComTypes.IPersistFile ipf= (System.Runtime.InteropServices.ComTypes.IPersistFile)(ifilt);

? ? ? ? ? ? ? ? ipf.Load(@path, 0);

? ? ? ? ? ? ? ? uint i = 0;

? ? ? ? ? ? ? ? STAT_CHUNK ps = new STAT_CHUNK();

? ? ? ? ? ? ? ? ifilt.Init(IFILTER_INIT.NONE, 0, null, ref i);

? ? ? ? ? ? ? ? int hr = 0;

? ? ? ? ? ? ? ?

? ? ? ? ? ? ? ? while (hr == 0)

? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ?

? ? ? ? ? ? ? ? ? ? ? ? ifilt.GetChunk(out ps);

? ? ? ? ? ? ? ? ? ? ? ? if (ps.flags == CHUNKSTATE.CHUNK_TEXT)

? ? ? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? ? ? uint pcwcBuffer = 1000;

? ? ? ? ? ? ? ? ? ? ? ? ? ? int hr2 = 0;

? ? ? ? ? ? ? ? ? ? ? ? ? ? while (hr2 == Constants.FILTER_S_LAST_TEXT || hr2 == 0)

? ? ? ? ? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? try

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? pcwcBuffer = 1000;

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? System.Text.StringBuilder sbBuffer = new StringBuilder((int)pcwcBuffer);

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hr2 = ifilt.GetText(ref pcwcBuffer, sbBuffer);

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? // Console.WriteLine(pcwcBuffer.ToString());

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? if (hr2 >= 0) result += sbBuffer.ToString(0, (int)pcwcBuffer);

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? //textBox1.Text +="\n";

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? // result += "#########################################";

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? count++;

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? catch (System.Runtime.InteropServices.COMException myE)

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Console.WriteLine(myE.Data + "\n" + myE.Message + "\n");

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? ? ?

? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ?

? ? ? ? ? ? }

? ? ? ? ? ? catch (System.Runtime.InteropServices.COMException myE)

? ? ? ? ? ? {

? ? ? ? ? ? ? ? Console.WriteLine(myE.Data + "\n" + myE.Message + "\n");

? ? ? ? ? ? }

? ? ? ? ? ? text = result;

? ? ? ? ? ? //return count;

?  return;

? ? ? ? }

? ? }

}

Edited by dtmunir
  • 0

Hi I get these errors while trying to make a class library, any ideas ???

Preparing resources...

Updating references...

Performing main compilation...

e:\documents and settings\upake\my documents\visual studio projects\classlibrary2\officefilereader.cs(308,36): error CS0234: The type or namespace name 'ComTypes' does not exist in the class or namespace 'System.Runtime.InteropServices' (are you missing an assembly reference?)

e:\documents and settings\upake\my documents\visual studio projects\classlibrary2\officefilereader.cs(309,5): error CS0246: The type or namespace name 'ipf' could not be found (are you missing a using directive or an assembly reference?)

e:\documents and settings\upake\my documents\visual studio projects\classlibrary2\officefilereader.cs(338,27): error CS0117: 'System.Runtime.InteropServices.COMException' does not contain a definition for 'Data'

e:\documents and settings\upake\my documents\visual studio projects\classlibrary2\officefilereader.cs(349,23): error CS0117: 'System.Runtime.InteropServices.COMException' does not contain a definition for 'Data'

Please help

  • 0

ok, i dont hv VS 2003, or .Net 1.1.

I built this on VS 2005 and .Net 2.0 Beta

but i dont think that should be a problem, b/c the code isnt mine, and the site i took it from didnt build it on .Net 2.0

since i cant figure out wat the problem is, if u want, i could compile this code into a dll, so that you could use.

or if any one else has managed to make this code work on VS 2003, pls share....

  • 0

Hi,

That will be great, can you mail the dll to upake.de.silva@gmail.com.

Thanks man

Upake

  dtmunir said:
ok, i dont hv VS 2003, or .Net 1.1.

I built this on VS 2005 and .Net 2.0 Beta

but i dont think that should be a problem, b/c the code isnt mine, and the site i took it from didnt build it on .Net 2.0

since i cant figure out wat the problem is, if u want, i could compile this code into a dll, so that you could use.

or if any one else has managed to make this code work on VS 2003, pls share....

586126041[/snapback]

  • 0

Thanks a lot, this is exactly what I was looking for.

I'm also using VS .NET 2003 and I got the same compiler errors but managed to fix it - I don't know yet what I'm doing, hehe.

I can now read excel, word and powerpoint files. I tried installing the adobe pdf ifilter, but I still can't read pdf's. Does anyone have an idea how to get pdfs to work?

Change the comments=

 //System.Runtime.InteropServices.UCOMIPersistFile ipf = (System.Runtime.InteropServices.UCOMIPersistFile)(ifilt);
System.Runtime.InteropServices.ComTypes.IPersistFile ipf= (System.Runtime.InteropServices.ComTypes.IPersistFile)(ifilt);

to:

System.Runtime.InteropServices.UCOMIPersistFile ipf = (System.Runtime.InteropServices.UCOMIPersistFile)(ifilt);
//System.Runtime.InteropServices.ComTypes.IPersistFile ipf= (System.Runtime.InteropServices.ComTypes.IPersistFile)(ifilt);

  • 0

u need to change the GUID value of the GUID Attribute at the CFilter. The class is as below

[ComImport()]

[Guid("4C904448-74A9-11d0-AF6E-00C04FD8DC02")]

public class CFilter

{

}

and now u can read pdf files as text.

I have some bit lying around in my home pc which returns iFilter according to the file (ofcourse if the ifilter is installed for that file type). As soon as i reach home. I will post the code.

Umer

  • 0

Just to add: I also had to remove the myE.Data to get it to work, seems to be another 2005 thing?

Anyway, I wonder if there is some way to know if the last chunk has been read? it seems to just try reading until an exception is thrown? The variable hr might have been intended for this, but it is never changed in the code - any idea what hr could stand for?

  • 0

I guess I found the solution to the problem myself.

I never tried any COM interop before this, so quite a bit of the "magic" is a bit confusing. But If I change

void GetChunk([MarshalAs(UnmanagedType.Struct)] out STAT_CHUNK pStat);

to

[PreserveSig]
int GetChunk([MarshalAs(UnmanagedType.Struct)] out STAT_CHUNK pStat);

It doesn't throw exceptions any more but instead returns error values like the original method.

Sorry i'm new to this forum, it might be trivial but it caused be a great deal of headache :)

  • 0

Use the below class together with the class posted by dtmunir before and use this below sample to parse any kind of file whose iFilter is installed on the machine

I got this code sample from CodeProject. There is a project by the name of Office Desktop Search. I found the below very useful so i thought i should post it here.

And it doesnt matter to my boss using Internet at the workplace danish bhai. They dont even know i have something known as USB Flash Drive (256MB) in my pocket everytime;) but i try to be as sincere as possible!!! I hv never copied any office file on the usb :shifty: :rolleyes:

Looks like as if we are

Umer

// Sample for using the below class
public string ParseFile(string filename)
{
  if (Parser.IsParseable(filename) == true)
  {
    return Parser.Parse(path)
  }
  else
  {
   return "" // or throw an exception or whatever
  }
}

// Add all this code to another file and place it with the class posted by the dtmunir
using System;
using System.Runtime.InteropServices;
using System.Text;

namespace OfficeFileReader
{
	/// <summary>
	/// Summary description for Parser.
	/// </summary>
	public class Parser
	{
  public Parser()
  {
  }

  [DllImport("query.dll", CharSet = CharSet.Unicode)] 
  private extern static int LoadIFilter (string pwcsPath, ref IUnknown pUnkOuter, ref IFilter ppIUnk); 

  [ComImport, Guid("00000000-0000-0000-C000-000000000046")] 
  [InterfaceType(ComInterfaceType.InterfaceIsIUnknown)] 
  private interface IUnknown 
  { 
  	[PreserveSig] 
  	IntPtr QueryInterface( ref Guid riid, out IntPtr pVoid ); 

  	[PreserveSig] 
  	IntPtr AddRef(); 

  	[PreserveSig] 
  	IntPtr Release(); 
  } 


  private static IFilter loadIFilter(string filename)
  {
  	IUnknown iunk = null; 
  	IFilter filter = null;

  	// Try to load the corresponding IFilter 
  	int resultLoad = LoadIFilter( filename, ref iunk, ref filter ); 
  	if (resultLoad != (int)IFilterReturnCodes.S_OK) 
  	{ 
    return null;
  	} 
  	return filter;
  }

/*
  private static IFilter loadIFilterOffice(string filename)
  {
  	IFilter filter = (IFilter)(new CFilter());
  	System.Runtime.InteropServices.UCOMIPersistFile ipf = (System.Runtime.InteropServices.UCOMIPersistFile)(filter);
  	ipf.Load(filename, 0);

  	return filter;
  }
*/

  public static bool IsParseable(string filename)
  {
  	return loadIFilter(filename) != null;
  }

  public static string Parse(string filename)
  {
  	IFilter filter = null;

  	try 
  	{
    StringBuilder plainTextResult = new StringBuilder();
    filter = loadIFilter(filename); 

    STAT_CHUNK ps = new STAT_CHUNK();
    IFILTER_INIT mFlags = 0;

    uint i = 0;
    filter.Init( mFlags, 0, null, ref i);

    int resultChunk = 0;

    resultChunk = filter.GetChunk(out ps);
    while (resultChunk == 0)
    {
    	if (ps.flags == CHUNKSTATE.CHUNK_TEXT)
    	{
      uint sizeBuffer = 60000;
      int resultText = 0;
      while (resultText == Constants.FILTER_S_LAST_TEXT || resultText == 0)
      {
      	sizeBuffer = 60000;
      	System.Text.StringBuilder sbBuffer = new System.Text.StringBuilder((int)sizeBuffer);
      	resultText = filter.GetText(ref sizeBuffer, sbBuffer);

      	if (sizeBuffer > 0 && sbBuffer.Length > 0)
      	{
        string chunk = sbBuffer.ToString(0, (int)sizeBuffer);
        plainTextResult.Append(chunk);
      	}
      }
    	}
    	resultChunk = filter.GetChunk(out ps);
    }
    return plainTextResult.ToString();
  	}
  	finally
  	{
    if (filter != null)
    	Marshal.ReleaseComObject(filter);
  	}
  }
	}
}

  • 0

This works too, but in all the approaches I have tried so far, I always get an application error, but only with pdf files:

(ReadFile.exe is the name of my assembly)

Font Capture: ReadFile.exe - Application Error

The instruction at "0x030a61b3" referenced memory at "0x03a823e8". The memory could not be "read"

This always happens when my program closes - it works perefctly fine until I exit Main()...

I wonder if this has something to do with the Adobe IFilter not being released properly?

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • For the foreseeable that is your choice. I'm interested tom try one, my wife was very pleased because one she is anti-social and driver chat annoys her and two more seriously there is a long history of drivers abusing women, it's rare, but it happens and more than it should. Sometimes she needs to get a late taxi and she says it may make her feel safer.
    • 5800x3d was chopped because it was too good, why bother releasing a new mid range.
    • Was it bad, sure, but I got it on PC so it was nowhere near as bad as the PS4/XBO versions. Besides, I spent around 200hrs in there and have yet to play the expansion for it. All in all I think I got my monies worth. Maybe I was lucky and didn't run into anything game breaking like others. Either way, $60 is my limit for a game, the point is I'm not going over that no matter what the title is or how much I want to play it. Thank god I don't suffer from FOMO.
    • spwave 0.9.0-1 by Razvan Serea spwave is a cross-platform audio editor designed for research and advanced analysis. It supports a wide range of audio formats, including WAV, AIFF, MP3, Ogg Vorbis, FLAC, ALAC, raw PCM, and more via plug-ins. spwave offers precise editing tools such as zoom, crop, fade in/out, gain adjustment, and region extraction. It enables detailed spectral and phase analysis and supports unlimited undo/redo. Users can drag and drop files, edit metadata, save labeled regions, and view multiple synchronized waveforms. Internally, spwave processes audio in 64-bit precision, ensuring high accuracy. It runs on Windows, macOS, and Linux, making it a reliable and flexible tool for audio research and editing. spwave has following features: Support for multiple platforms: Windows, macOS, Linux (Motif, gtk), etc. Support for WAV, AIFF, MP3, Ogg Vorbis, FLAC, ALAC, raw, and text files by using plug-ins. Support for many bits/samples: 8bits, 16bits, 24bits, 32bits, 32bits float, 64bits double. Converting the sampling frequency and the bits/sample of a file. Playing, zooming, cropping, deleting, extracting, etc. of a selected region. Fade-in, fade-out, gain adjustment, channel swapping, etc of a selected region. Editing file information that supports comments of WAV and AIFF, and ID3 tag of MP3. Analysis of a selected region using several analysis types, e.g. spectrum, smoothed spectrum, phase, unwrapped phase and group delay. Undoing and redoing without limitation of the number of times. Waveform extraction by drag & drop. Opening files by drag & drop. Autosaving of selected regions (you can do this by drag & drop also). Saving positions and regions as labels. Viewing some waveforms and setting regions synchronously. Almost all processing is 64 bits processing internally. Supported Formats: Read/Write: WAV, AIFF, AIFC, CAF, MP3, Ogg Vorbis, FLAC, ALAC (.caf, .mp4), WMA (Windows), APE, AU/SND, PARIS, NIST, IRCAM, raw PCM, text. Read-only: MPEG-2 Layer 3 MP3, RMP files with VBR support. With 64-bit internal processing, autosave capabilities, and synchronized multi-view waveform editing, spwave is a solid tool for anyone handling complex audio editing or acoustic research. spwave 0.9.0-1 changelog: Implemented CQT spectrum and CQT spectrogram (beta version). Implemented piano-key display for spectrum/spectrogram view. Implemented indication of musical note name in cursor information for spectrum/spectrogram view. Fixed a bug that spectrogram view after zoom-in with large factor sometimes freezes. Fixed a bug that scroll and zoom-out in spectrogram view after zoom-in with large factor do not work correctly. Fixed a bug that spectrogram view provides sometimes wrong time information. Fixed a bug that plugin errors sometimes cause a crash. Fixed a bug that the color of grid lines is wrong in printing. Optimized layout of spectrogram view for printing. Enhanced the function of waveform cropping from label information. Fixed a bug that some items in the preference dialog related to labels do not work. Added some items related to the region label in the preference dialog. Fixed a bug that drawing selected region in the log-frequency axis does not work correctly. Added partial support for the dark mode of Windows (the menu bar and the menus). Fixed a bug that the cursor to indicate current calculation position of spectrogram is sometimes not shown. Changed drawing of cursor information into that with white background so as to make the information legible. Fixed a bug that moving to the head by scrolling the overview display sometimes fails. Added feature of alignment of the view region between spectrum view and spectrogram view. Download: spwave 64-bit | spwave 32-bit | ~3.0 MB (Freeware) Download: spwave ARM64 | 2.9 MB Links: spwave Home page | Other OSes | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • Microsoft Weekly: redesigned Windows 11 Start menu, Xbox handheld is here, and more by Taras Buria This week's news recap is here. Fresh Windows 11 preview builds with the redesigned Start menu and Windows Vista flashbacks, the long-anticipated Xbox handheld, Patch Tuesday updates, gaming news, and more. Quick links: Windows 10 and 11 Windows Insider Program Updates are available Gaming news Great deals to check Windows 11 and Windows 10 Here, we talk about everything happening around Microsoft's latest operating system in the Stable channel and preview builds: new features, removed features, controversies, bugs, interesting findings, and more. And, of course, you may find a word or two about older versions. June 2025 Patch Tuesday updates are out. Windows 10 received KB5060533 with build numbers 19044.5965 and 19045.5965. Supported Windows 11 versions received KB5060842 and KB5060999 with build numbers 26100.4349, 22631.5472, and 22621.5472. Later, Microsoft released an out-of-band update to address problems with games with Easy Anti-Cheat, causing system restarts upon launch, and a couple of recovery updates. Microsoft launched Copilot Vision with Highlights for Windows. This feature enables AI to see what is happening on the screen and offer additional information, analysis, and context. Copilot Vision currently works with up to two apps, but its availability is limited to the United States (more countries are on the way, says Microsoft). Now, here is some useful stuff for Windows users: a neat third-party maintenance tool that can run various checks, troubleshooters, and repair utilities; a useful guide about personalizing OneDrive folders with a touch of color, and more. Windows Insider Program Here is what Microsoft released for Windows Insiders this week: Builds Canary Channel Dev Channel Build 26200.5641 This build introduces the recently announced Start menu redesign. It also packs Lock Screen widget improvements, Narrator enhancements, updates to the gamepad keyboard, and a lot of various fixes. Build 26200.5651 Another Dev build introduced a Settings app agent, Recall improvements, seconds for the calendar clock, context menu enhancements, and more. Beta Channel Build 26120.4250 The Beta build has the same changelog as the one from the Dev Channel. Build 26120.4441 The same build as 26200.5651 from the Dev Channel. Release Preview Channel Build 22631.5545 With build 22631.5545 for Widnows 11, Microsoft improved default browser settings and the Windows Share UI and fixed several bugs. Build 19045.6029 This build introduces improvements to app defaults and multiple fixes for Windows 10. The redesigned Start menu is the most exciting part of the new builds, but as usual, it is rolling out gradually. You can mitigate that by force-enabling the new Start menu using the ViVeTool app as described in our guide. Interestingly, the latest builds introduced a funny bug where Windows 11 plays the Windows Vista startup sound on boot. Microsoft acknowledged the issue and said it is working on a fix in future updates. Meanwhile, if you use the latest Dev and Beta builds, you will get to enjoy 2006 nostalgia each time you turn on your PC. Updates are available This section covers software, firmware, and other notable updates (released and coming soon) delivering new features, security fixes, improvements, patches, and more from Microsoft and third parties. This week's browser updates include a fresh Dev Channel update for Microsoft Edge and secure password deployment in Edge for organizations. The latter arrived in the Stable Channel on June 13 with version 137.0.3296.83. There was also a minor update for Firefox. The latter received version 139.0.4, which addressed several issues with the browser freezing when switching apps, failing to save wallpapers with proper names, and more. In addition to the update, Mozilla announced that Deepfake Detector is shutting down. The service will go dark on June 26, 2025. Moving to Office updates, we have some changes to the new Outlook, which will block more files and allow you to perform more tasks when offline. OneDrive for Mac now supports external disks, Clipchamp lets you trim videos by cutting out parts of the transcript, and OneNote now supports Copilot Notebooks. Microsoft also announced an update on the removal of Exchange Online Basic Authentication in Office 365. Here are other updates and releases you may find interesting: Rufus received an update to version 4.8 with performance improvements for Windows images. Microsoft is committing to upskilling 1 million UK workers in AI this year. Here are the latest drivers and firmware updates released this week: Intel 32.0.101.6881 WHQL graphics driver with a single fix for Overwatch 2. AMD 25.6.2 non-WHQL with support for FBC: Firebreak, The Alters, and more. On the gaming side Learn about upcoming game releases, Xbox rumors, new hardware, software updates, freebies, deals, discounts, and more. A lot happened on the gaming side this week. At the Sunday Game Showcase, Microsoft and ASUS announced two Xbox handhelds: the ROG Xbox Ally and the ROG Xbox Ally X. These portable consoles are a big deal for the world of handheld devices, as they run a special version of Windows, which was optimized for portable gaming consoles with fewer processes running in the background. As such, they offer much better battery life and performance. You can read more about how Microsoft optimized Windows 11 for handhelds in a separate article. Next, we have plenty of new games and DLCs announced at the showcase; here is a recap: Indiana Jones and the Great Circle received a new DLC called The Order of Giants. It will be available on all supported platforms this September. Call of Duty: Black Ops 7 made a surprise appearance at the showcase. Activision released a teaser trailer where the game takes players to a futuristic experience set in 2035. Grounded 2 was announced. The sequel of the game for people with arachnophobia is coming next month, offering gamers a new miniaturized survival adventure. Obsidian Entertainment revealed the release date of The Outer Worlds 2 and details about companions. At Fate's End by Spiritfarer was announced, a new action game about fighting family. It is coming to consoles and PC somewhere in 2026. Skybound Games revealed Invincible VS, a brutal 3v3 tag fighting game by former Killer Instinct developers. Anno 117: Pax Romana received a November release date. Ubisoft also unveiled a special Governor's Edition. Nvidia announced new games for its cloud-streaming gaming service, GeForce NOW. If you own one of the following games, you can play them on Nvidia's cloud. The new additions include Frosthaven Demo, Dune: Awakening, MindsEye, The Alters, Kingdom Two Crowns, and more. Mojang finally has a release date for Vibrant Visuals and Chase the Skies updates. On June 17, Minecraft will get its long-anticipated visual overhaul, new features, fresh mobs, and more. Deals and freebies Steam is running a new Next Fest, during which gamers can try hundreds of games for free. The event ends on June 16, 2025. The Epic Games Store is giving away Two Point Hospital, a humorous hospital builder simulator. As usual, more deals are available in this week's Weekend PC Games Deals article. Other gaming news includes the following: GOG store introduced the One-Click Mods feature with support for Fallout: London and others. Valve announced new accessibility details for game listings on Steam. Steam finally has a native client for Apple Silicon. To finish this week's gaming section, here is an editorial from Paul Hill exploring the new $80 cost frontier in modern gaming. Great deals to check Every week, we cover many deals on different hardware and software. The following discounts are still available, so check them out. You might find something you want or need. JBL Bar 1000 and 700 sound bars Ring Floodlight Cameras Geekom Mini IT12 mini PC - $499 | $200 off Amazon Kindle Scribe (16GB) - $299.99 | 25% off LG gram Pro 16" Copilot+ PCs - $1,499.99 | 25% off GameSir Super Nova Wireless Controller for PC and mobile - $44.99 | 25% off Intel Core Ultra 7 Desktop Processor 265K 5.5 GHz - $259.99 | $144 off 12TB Seagate IronWolf Pro HDD - $218.49 | 13% off Polk Audio React 7" Wireless Subwoofer - $99.99 | 50% off StreamMaster Plus2 4K Gaming Projector - $1,699 | 15% off AMD Ryzen 5 9600X - $179.99 | 35% off Sony BRAVIA 5 65 Inch TV Mini LED - $1,298 | 13% off This link will take you to other issues of the Microsoft Weekly series. You can also support Neowin by registering a free member account or subscribing for extra member benefits, along with an ad-free tier option. Microsoft Weekly image background by
  • Recent Achievements

    • Apprentice
      Wireless wookie went up a rank
      Apprentice
    • Week One Done
      bukro earned a badge
      Week One Done
    • One Year In
      Wulle earned a badge
      One Year In
    • One Month Later
      Wulle earned a badge
      One Month Later
    • One Month Later
      Simmo3D earned a badge
      One Month Later
  • Popular Contributors

    1. 1
      +primortal
      595
    2. 2
      ATLien_0
      277
    3. 3
      +FloatingFatMan
      181
    4. 4
      Michael Scrip
      148
    5. 5
      Steven P.
      111
  • Tell a friend

    Love Neowin? Tell a friend!