Step Into Referenced C++ Project from C# Project

4,678 · August 7, 2013

EDIT 2:

Latest problems and questions are most likely on the last page. This thread is now used for C++ / C# questions I have (so not to create a thread per question).

I have a C# (GUI) project which in the most dumb way possible generates a fractal and well, I thought it would be fun to code the performance critical section C++ - a language that I don't know

So I added a C++ CLR Library Project - blank and managed to set it 64-bit after a few failed attempts.

I referenced the C++ project from the C# portion.

Issue is I cannot step into it. I pause where I call the C++ class and tell it to Step Into and it doesn't.

What am I doing wrong here?

EDIT,

Some other questions,

C# byte is just an unsigned char in C++? (I am going to be dealing w/ BGR values)

If I want to run a function in a thread in C++, is it true that I need to pass a struct pointer containing all the values?

Edited August 13, 2013 by _Alexander

August 7, 2013

Did you compile the C++ into a DLL, then make a C# header file (Class wrapper) to reference the DLL?

If you are mixing code, I believe they must be compiled seperately.

8,753 · August 7, 2013

Make sure you check "Unmanaged code debugging" in your C# project options.

C# byte is an unsigned byte, exactly like the C++ "unsigned char", indeed.

Keep in mind "int" is a 32-bit integer in C# (it's an alias for System.Int32) while "int" is platform-dependent in C++. Stick to cstdint types to avoid possible confusion there.

I'm not too familiar with standard threads in C++ seeing as it's a novelty, but from a quick glance at the documentation it looks like you pass a function and each of its arguments separately. Actually since you're doing C++/CLI you probably just want to stick with System::Threading::Thread.

Also keep in mind there won't be significant performance advantages to doing numeric code in C++ unless you compile that specific code with /clr disabled and all optimisations on. Actually C++ tends to be slower than C# in debug builds.

4,678 · August 8, 2013

Actually, turns out, because I didn't have a body defined for the function, it just glossed over it.

I defined body and got it to go in it.

Anyway, I do not like C++/CLI anymore. Just look at how nasty it is (this is the only way I managed to start a thread)

System::Collections::Generic::List<Thread^>^ list = gcnew System::Collections::Generic::List<Thread^>(LogicalProcessors());
for (int i = 0; i < HEIGHT; i += amount)
{
int to = System::Math::Min(i + amount, HEIGHT);
ParameterizedThreadStart^ st = gcnew ParameterizedThreadStart(this, &Fractal::Parallel);
Thread^ th = gcnew System::Threading::Thread(st);
array<Int32>^ value2 = {i, to, bitmapData->Stride, cCount};
th->Start(value2);
list->Add(th);
}

and it is slightly slower.

And finally managed to get normal C++ to work,

extern "C"
{
__declspec(dllexport) int Test(int a, int b)
{
return a + b;
}
}

will try making it give back an BGR byte array, and after that C++ extensions from Intel and NVIDIA look interesting...

8,753 · August 8, 2013

Well if you understand the memory models of .NET and native C++ perfectly, then C++/CLI is no more difficult than combining both all the time :P That is to say it's a total cluster###### (4 different concepts of reference! *, &, ^, %). But I still like it for any kind of involved interop code; P/Invoke gets a lot messier a lot faster IMO.

4,678 · August 8, 2013

So I hacked together the normal C++ version, here is the performance difference (when debugging),

Safe = Managed C#

Unsafe = C++ library

Safe V. Unsafe
6255.649 V. 296.4006
6318.052 V. 405.6007
6318.0506 V. 280.7993
6520.8494 V. 343.2054
6458.4503 V. 405.6058
6255.6451 V. 265.2043
6427.2456 V. 327.6034
6318.0526 V. 421.202
6333.6444 V. 421.2078
7300.8531 V. 483.6044
UNSAFE WINS = 10
SAFE WINS = 0
SAFE COUNT = 64506.4921
UNSAFE COUNT = 3650.4337

Anyone know how do you Marshal a byte* ptr from an extern function into another byte *ptr in C#?

8,753 · August 9, 2013

That an unusually large difference, I suspect your C# version could be improved. Are you doing any byte-per-byte copies rather than using Array.Copy or Marshal.Copy by any chance?

Quote
Anyone know how do you Marshal a byte* ptr from an extern function into another byte *ptr in C#?

Can't you just use IntPtr?

August 9, 2013

My suggestion would be to work in either managed(including C++/CLI) or unmanaged but not both, just to avoid all the hassle. But understandably, sometimes that can't be helped, especially since going for both the feature set of .Net and the speed of optimized machine code is always tempting. Something like this http://stackoverflow.com/questions/7679522/return-array-of-integers-from-cross-platform-dll might be what you're looking for?

4,678 · August 9, 2013

On 09/08/2013 at 01:47, Salutary7 said:

My suggestion would be to work in either managed(including C++/CLI) or unmanaged but not both, just to avoid all the hassle. But understandably, sometimes that can't be helped, especially since going for both the feature set of .Net and the speed of optimized machine code is always tempting. Something like this http://stackoverflow.com/questions/7679522/return-array-of-integers-from-cross-platform-dll might be what you're looking for?

Actually I am just trying everything out.

Right now I am trying to figure the best way to generate an byte array in C++ and pass it to C#. Efficiently with no memory leaks.

In C++ I have this:

extern "C"
{
__declspec(dllexport) unsigned char* Generate(
int width, int height, int iterations, double cReal, double cImaginary, double minX, double maxX, double minY, double maxY,
int cCount, int stride
)
{
Fractal* f = new Fractal(width, height, iterations, cReal, cImaginary, minX, maxX, minY, maxY);
unsigned char* ptr = f->Run(cCount, stride);
f->~Fractal();
return ptr;
}
}

I have no idea if explicitly calling the destructor is needed in the case of extern.

The destructor for the class only delete[] the int* array (the iteration count for a specific coordinate) and not the unsigned char* memory.

This gets passed to C# code

[DllImport(@"E:\SHARED\FractalViewer\x64\Debug\PureCPP.dll")]
private static extern IntPtr Generate( //byte* Generate(
int width, int height, int iterations, double cReal, double cImaginary, double minX, double maxX, double minY, double maxY,
int cCount, int stride
);

But I am not sure how to, do the below code properly (proper max speed copy without intermediate array and proper dealloc),

 
BitmapData bmData = image.LockBits(
new System.Drawing.Rectangle(0, 0, image.Width, image.Height),
ImageLockMode.WriteOnly,
image.PixelFormat);
IntPtr ptr = Generate(WIDTH, HEIGHT, iterations, cReal, cImaginary, minX, maxX, minY, maxY, 3, bmData.Stride);
// The code below feels bad, is there a way to use Marshal.SomeFunction(..) to do this?
UInt64* rgb = (UInt64 *) ptr.ToPointer();
UInt64* img = (UInt64*)bmData.Scan0.ToPointer();
int imgSizeIn64 = (bmData.Stride * HEIGHT) >> 3;
for (int index = 0; index < imgSizeIn64; index++)
{
img[index] = rgb[index];
}
// Marshal.FreeHGlobal(ptr); // Invalid Access to Memory Region
// Marshal.FreeBSTR(ptr); // Stops Working
// Marshal.FreeCoTaskMem(ptr); // Access Violation
// What do Here?
image.UnlockBits(bmData);
return image;

?

I just ran this for a 9000 x 9000 surface, 26 times, sitting at 4GB =(

EDIT: This solution probably get me punched in the nads,

?

private static extern void DeAlloc(IntPtr ptr);

August 9, 2013

On 08/08/2013 at 23:27, _Alexander said:
So I hacked together the normal C++ version, here is the performance difference (when debugging),

Safe = Managed C#

Unsafe = C++ library
Safe V. Unsafe
6255.649 V. 296.4006
6318.052 V. 405.6007
6318.0506 V. 280.7993
6520.8494 V. 343.2054
6458.4503 V. 405.6058
6255.6451 V. 265.2043
6427.2456 V. 327.6034
6318.0526 V. 421.202
6333.6444 V. 421.2078
7300.8531 V. 483.6044
UNSAFE WINS = 10
SAFE WINS = 0
SAFE COUNT = 64506.4921
UNSAFE COUNT = 3650.4337
Anyone know how do you Marshal a byte* ptr from an extern function into another byte *ptr in C#?

To accurately benchmark .NET code you must compile it in release mode, run it WITHOUT the debugger attached, and run the code before benchmarking it, so it will get optimized by the jitter.

4,678 · August 9, 2013

On 09/08/2013 at 04:20, notchinese said:
To accurately benchmark .NET code you must compile it in release mode, run it WITHOUT the debugger attached, and run the code before benchmarking it, so it will get optimized by the jitter.

The results are not pretty,

9000 x 9000, 50 iterations, i5 3570k @ 4Ghz @ 4 Threads

Safe V. Unsafe
3223.0689 V. 3421.2571
3216.0616 V. 3420.2569
3222.0691 V. 3411.2463
3464.3002 V. 3419.2537
3246.0926 V. 3427.2612
3242.0882 V. 3433.2674
3447.2835 V. 3424.2587
3477.3106 V. 3426.2625
3221.0686 V. 3421.2555
3456.2926 V. 3422.2565
UNSAFE WINS = 4
SAFE WINS = 6
SAFE COUNT = 33215.6359
UNSAFE COUNT = 34226.5758

I guess I need to work on this hellish pile of what the hell,

 inline int Fractal::GetIterationCount(double a, double b)
 {
  int iterationCount = 0;
        double _a = a;
        double _b = b;
  
        double _aSq = a * a;
        double _bSq = b * b;
        for (int i = 0; i < iterations; i++)
        {
            _a = _aSq - _bSq;
            _b = 2 * a * b;
            _a += cReal;
            _b += cImaginary;
            _aSq = _a * _a;
            _bSq = _b * _b;
            if ((_aSq + _bSq) < thesholdSq)
            {
                iterationCount = i;
            }
   
            a = _a;
            b = _b;
        }
  
        return iterationCount;
 }

8,753 · August 9, 2013

extern "C"
{
    __declspec(dllexport) unsigned char* Generate(int width, int height, int iterations, double cReal, double cImaginary, double minX, double maxX, double minY, double maxY, int cCount, int stride)
    {
        Fractal* f = new Fractal(width, height, iterations, cReal, cImaginary, minX, maxX, minY, maxY);
        unsigned char* ptr = f->Run(cCount, stride);
        f->~Fractal();
        return ptr;
    }
}

In C++, use the stack as much as possible rather than new/delete. So your function becomes:

Fractal f(width, height, iterations, cReal, cImaginary, minX, maxX, minY, maxY);
return f.Run(cCount, stride);

No memory leaks.

Anything you allocate with new must be de-allocated with delete, so if you wanted to do it with new:

Fractal* f = new Fractal
unsigned char* result = f->Run(cCount, stride);
delete f;
return result;

delete calls the destructor. In C++/CLI, you also have to deal with reference types implementing IDisposable, in which case the same applies except they're allocate with gcnew rather than new. In this case delete calls Dispose().

Your code seems weird because result points to an array allocated by the Fractal object, but you destroy this object before returning, therefore Fractal cannot free the array; someone else has to do it which makes the code brittle. In general when you're dealing with unmanaged resources, make sure the object responsible for allocating them is also responsible for de-allocating them. For instance here the caller could be responsible for pre-allocating the array and freeing it when it's done with the data.

Or just use managed types instead.

Quote
Right now I am trying to figure the best way to generate an byte array in C++ and pass it to C#. Efficiently with no memory leaks.

For an unmanaged array:

unsigned char* arr = new unsigned char[dim];
// pass to C# as IntPtr
delete[] arr;

For a managed array:

array<unsigned char>^ arr = gcnew array<unsigned char>(dim);
// pass to C# as array<unsigned char>^

And no need to delete as it's GCed. Try to rely on managed types as much as possible to avoid leaks. Managed arrays can be temporarily pinned to access them using pointers.

// The code below feels bad, is there a way to use Marshal.SomeFunction(..) to do this?
UInt64* rgb = (UInt64 *) ptr.ToPointer();
UInt64* img = (UInt64*)bmData.Scan0.ToPointer();
int imgSizeIn64 = (bmData.Stride * HEIGHT) >> 3;
for (int index = 0; index < imgSizeIn64; index++)
{
    img[index] = rgb[index];
}

For unmanaged to unmanaged, use memcpy - you'll have to P/Invoke it from C# (hey, the example on that page is exactly what you're trying to do!). For managed to unmanaged or vice-versa, use Marshal.Copy. For managed to managed, use Array.Copy. This is way faster than byte-per-byte copy. You can probably speed up your code by 2-4 times just by using a copy function rather than a loop like this.

// Marshal.FreeHGlobal(ptr); // Invalid Access to Memory Region
// Marshal.FreeBSTR(ptr); // Stops Working
// Marshal.FreeCoTaskMem(ptr); // Access Violation
// What do Here?

Assuming ptr was allocated using new, just use delete. Anything you allocate with new, call delete on it when you're done. Anything you allocate with new[], call delete[] on it when you're done.

Marshal.FreeHGlobal is for Marshal.AllocHGlobal, etc.

That said, your life would be a lot simpler if you used a managed array and Marshal.Copy'dinto the bitmap. You don't really gain performance by using an unmanaged array (it's just a blob of memory in any case), the possible gain here is having the C++ optimiser go over your inner loops and do its magic.

4,678 · August 11, 2013

Ok memcpy works. Awesome.

I use new on unsigned int* color and I delete[] it in the destructor.

And the "// pass to C# as IntPtr" after memcpy() should call back to the unmanaged C++ code to delete[] the ptr?

I do not understand well here.I also do not understand why the bitmap internal pointer to the rgb array can't just be modified to avoid memcpy altogether.

I am still struggling in three areas - C++ is as fast as managed C# and not faster, looking for a good coloring algorithm, and for some reason the image is flipped on the X axis.

Today, in its entirety was dedicated to researching _m128d and _m256d and made a homebrew _m128d implementation. Homebrew futile performance wise - will seek google and stackoverflow...

I also noticed other fractal implementations use single and not double precision. Will start working on float version...

One thing that I was struggling with was the fact that Stride for the bitmap was not width*3, I think was because width * height & 0xF was not zero.

8,753 · August 11, 2013

On 11/08/2013 at 05:12, _Alexander said:

Ok memcpy works. Awesome.

I use new on unsigned int* color and I delete[] it in the destructor.

And the "// pass to C# as IntPtr" after memcpy() should call back to the unmanaged C++ code to delete[] the ptr?

I do not understand well here.I also do not understand why the bitmap internal pointer to the rgb array can't just be modified to avoid memcpy altogether.

If you're creating an unmanaged array in C++ then you need to deallocate it in C++ when you're done with the data. This is certainly a messy approach, it'd be much better to create the buffer from C#, pass it to C++, have the C++ code fill it up and free the buffer in C# later. You could indeed just create a bitmap, lock the bits, pass that pointer to C++, when C++ is done unlock the bits and avoid having to perform any copy or memory management, just Dispose() the bitmap where you're done with it. That's how I'd do it in any case.

Quote

I am still struggling in three areas - C++ is as fast as managed C# and not faster, looking for a good coloring algorithm, and for some reason the image is flipped on the X axis.

Today, in its entirety was dedicated to researching _m128d and _m256d and made a homebrew _m128d implementation. Homebrew futile performance wise - will seek google and stackoverflow...

I also noticed other fractal implementations use single and not double precision. Will start working on float version...

One thing that I was struggling with was the fact that Stride for the bitmap was not width*3, I think was because width * height & 0xF was not zero.

SSE intrinsics is definitely worth investigating for your use case, but it's a steep learning curve. Make sure you're compiling with all optimisations on and /clr disabled for that specific function (put it in a separate .cpp so you can disable /clr just for that code). If you're not compiling as native code with all optimisations on then it's useless to write in C++. Keep in mind anything above SSE2 has limited compatibility on modern CPUs - AVX was only supported by AMD on Bulldozer and Intel on Sandy Bridge (both 2011) so your code will just crash on anything earlier. I'd generally stick with SSE2 for compatibility.

Sign In

Step Into Referenced C++ Project from C# Project

Question

+Red King Subscriber²

Link to comment

Share on other sites

13 answers to this question

Recommended Posts

sao123

Link to comment

Share on other sites

Andre S. Veteran

Link to comment

Share on other sites

+Red King Subscriber²

Link to comment

Share on other sites

Andre S. Veteran

Link to comment

Share on other sites

+Red King Subscriber²

Link to comment

Share on other sites

Andre S. Veteran

Link to comment

Share on other sites

Salutary7

Link to comment

Share on other sites

+Red King Subscriber²

Link to comment

Share on other sites

notchinese

Link to comment

Share on other sites

+Red King Subscriber²

Link to comment

Share on other sites

Andre S. Veteran

Link to comment

Share on other sites

+Red King Subscriber²

Link to comment

Share on other sites

Andre S. Veteran

Link to comment

Share on other sites

Recently Browsing 0 members

Similar Content

Posts

Recent Achievements

Popular Contributors

Tell a friend

Choose your Ad Blocker