Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

11991 Posts in 1587 Topics- by 3508 Members - Latest Member: NevilleKemp

26. May 2012, 02:02:21 pm
Xith3D CommunityGeneral CategorySupport (Moderator: Marvin Fröhlich)allocateDirect nonheap memory java problem
Pages: [1]
Print
Author Topic: allocateDirect nonheap memory java problem  (Read 1469 times)
rwb1977
Enjoying the stay
*
Offline Offline

Posts: 45


View Profile
« on: 13. May 2010, 02:17:35 pm »

Hey there, we've run across a memory problem with direct memory not being garbage collected properly by the Java hotspot jvm.  We have googled this, and seen that others have discovered this problem also.  The problem is if you keep allocating lots of direct memory buffers, you hit GL_OUT_OF_MEMORY errors or OutOfMemory errors due to allocateDirect.  If we change xith and jagatoo to only use "allocate" instead of allocateDirect we can run the same scenes just fine.

I recommend that at least for texture buffers, that we introduce some sort of switch at a high level where we can specify we want texture buffers to be allocated on the heap or not.  This would be a great benefit to us.

Logged
horati
Global Moderator
Getting respectable
*****
Offline Offline

Posts: 393


View Profile
« Reply #1 on: 14. May 2010, 01:43:30 pm »

This is not a problem with the garbage collector or the JVM.  By design, allocating memory through any means bypassing the collector prevents its collection.  A system call to malloc or any other native allocation routine prevents collection.

By calling OpenGL's allocateDirect, the caller asserts that it will deallocate the memory.  If you are allocating these buffers directly, then it is your responsibility to deallocate them.  If Xith is doing it, then it is a bug.

Direct buffers (of any kind, not just OpenGL) are always considerably more efficient because they do not require copying data.  Using indirect buffers as you described would work; however, it would cause
  • Copying data into buffers managed by the garbage collector
  • Copying data from managed buffers to buffers managed by the driver
By using direct buffers, you reduce this to 1 copy from your program (in whatever manner you or your scenegraph stores it) to the buffer managed by the driver.  This should dramatically improve framerates although I do not have numbers to support this.

From your post, I infer that it is a bug in Xith.  Can you narrow it down some?
Logged

Kevin
"It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k."
http://stackoverflow.com/users/3474/sylvarking
rwb1977
Enjoying the stay
*
Offline Offline

Posts: 45


View Profile
« Reply #2 on: 25. May 2010, 10:43:30 pm »

allocateDirect nonheap memory is managed by the JVM via garbage collecting.  It is a bug in the JVM as far as I understand it.  You can test this by writing a program which allocates a temporary variable via allocateDirect thousands of times.  If you force System.gc() the program will not crash, but without the System.gc() it will eventually crash.  That is not to say that normal garbage collection does not deallocate memory for allocateDirect, but it appears to be a timing issue; it doesn't always happen in a robust, timely manner.  This is why a high level switch for xith is desired, to avoid the short-comings of the JVM.
Logged
horati
Global Moderator
Getting respectable
*****
Offline Offline

Posts: 393


View Profile
« Reply #3 on: 26. May 2010, 12:20:13 pm »

From Javadoc for ByteBuffer
Quote
A direct byte buffer may be created by invoking the allocateDirect factory method of this class. The buffers returned by this method typically have somewhat higher allocation and deallocation costs than non-direct buffers. The contents of direct buffers may reside outside of the normal garbage-collected heap, and so their impact upon the memory footprint of an application might not be obvious. It is therefore recommended that direct buffers be allocated primarily for large, long-lived buffers that are subject to the underlying system's native I/O operations. In general it is best to allocate direct buffers only when they yield a measureable gain in program performance.

A search of the codebase for allocateDirect found
  • 1 match in a -tk demo
  • 3 matches in loaders
  • 1 match in TextureImage3D

Oh wait, I just re-read your post and noticed that you stated jagatoo is using allocateDirect.  I do not have the jagatoo source code installed on my machine.  I see NO reason for jagatoo to use allocateDirect and further believe there are bugs in the uses of allocateDirect discovered above.  It seems that allocateDirect has been used indiscriminately; that needs to be fixed.
« Last Edit: 26. May 2010, 12:22:59 pm by horati » Logged

Kevin
"It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k."
http://stackoverflow.com/users/3474/sylvarking
rwb1977
Enjoying the stay
*
Offline Offline

Posts: 45


View Profile
« Reply #4 on: 26. May 2010, 02:29:29 pm »

I understand the docs for allocateDirect, but there is this indeed a problem with allocateDirect in some circumstances, of it holding on to temporary buffers that are taking up memory when they should be garbage collected.


For example, here is a program that if you run, eventually throws an out of memory error, when if you inspect it, the buffer that is being allocated is temporary and under normal java programming practices--you shouldn't have to worry about temporary buffers.

Code:
import java.nio.ByteBuffer;


public class MemoryTest
{
public static void main(String[]args)
{
long size = 0;
while (true)
{
ByteBuffer bb = ByteBuffer.allocateDirect(1000000);

size += bb.capacity();
System.out.println("Now allocated: " + (size/1024/1024) + " MB");

try
{
Thread.sleep(10);
}
catch (InterruptedException e)
{
}
}
}
}

The same program using allocate, will run continually, because the garbage collector kicks in at the right time:

Code:
import java.nio.ByteBuffer;


public class MemoryTest
{
public static void main(String[]args)
{
long size = 0;
while (true)
{
ByteBuffer bb = ByteBuffer.allocate(1000000);

size += bb.capacity();
System.out.println("Now allocated: " + (size/1024/1024) + " MB");

try
{
Thread.sleep(10);
}
catch (InterruptedException e)
{
}
}
}
}

The same program that calls the garbage collector, (forcing the reference to null first), allows the allocateDirect version to run continually as well.

Code:
import java.nio.ByteBuffer;


public class MemoryTest
{
public static void main(String[]args)
{
long size = 0;
while (true)
{
ByteBuffer bb = ByteBuffer.allocateDirect(1000000);

size += bb.capacity();
System.out.println("Now allocated: " + (size/1024/1024) + " MB");

try
{
Thread.sleep(10);
}
catch (InterruptedException e)
{
}
bb = null;
System.gc();
}
}
}

Because of this, I'm not sure how robust it is for our purposes to rely on allocateDirect, and thus my previous request for some sort of higher level switch.

I did find however that making all float buffers to use allocate instead of allocateDirect resulted in some illegal state exceptions.  It appears that gluegen-rt relies upon some of the buffers being allocated direct.  However I have found that textures are safe to use allocate instead of allocateDirect.
The gluegen-rt problem had to do with getting the values for a matrix from an allocated direct float buffer.

Logged
horati
Global Moderator
Getting respectable
*****
Offline Offline

Posts: 393


View Profile
« Reply #5 on: 27. May 2010, 12:57:19 am »

...there is this indeed a problem with allocateDirect in some circumstances, of it holding on to temporary buffers that are taking up memory when they should be garbage collected.

Yes.  It is documented and essentially says do not do it unless you know exactly when/how to do it correctly.  Apparently, we (the collective Xith Project) did not do it correctly.

I have no doubt that you can create code to exercise the problem and sometimes avoid it.

Because of this, I'm not sure how robust it is for our purposes to rely on allocateDirect, and thus my previous request for some sort of higher level switch.

As a rule of thumb, I would say that we should NOT use it at all unless we specifically document each call; i.e., why we did it and what documentation tells us it is safe to do it.

I did find however that making all float buffers to use allocate instead of allocateDirect resulted in some illegal state exceptions.  It appears that gluegen-rt relies upon some of the buffers being allocated direct.  However I have found that textures are safe to use allocate instead of allocateDirect.
The gluegen-rt problem had to do with getting the values for a matrix from an allocated direct float buffer.

I suspect that our usage of it at all started at the line(s) of code you found; i.e., it HAS to be called there because the graphics APIs want a matrix inside the memory of the graphics card.  BTW, such a buffer would not be collected by the JVM garbage collector.  That kind of buffer has to be collected manually through non-Java calls.
Logged

Kevin
"It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k."
http://stackoverflow.com/users/3474/sylvarking
Marvin Fröhlich
Xith Lord
Administrator
Guru
*****
Offline Offline

Posts: 4381


May the 4th, be with you...


View Profile
« Reply #6 on: 27. May 2010, 02:09:01 am »

I have changed JAGaToo and Xith to always create buffers through JAGaToo's BufferUtils class. This class has a static switch to control, whether to allow using direct buffers or not (default is allow). This should help you. I am curious about the performance you will see.

One this seems strange to me. If direct buffers are not handled very carefully by the garbage collector, there should be a method to manually deallocate these guys. But there doesn't seem to be one.

Marvin
Logged
horati
Global Moderator
Getting respectable
*****
Offline Offline

Posts: 393


View Profile
« Reply #7 on: 27. May 2010, 03:11:30 am »

I have changed JAGaToo and Xith to always create buffers through JAGaToo's BufferUtils class. This class has a static switch to control, whether to allow using direct buffers or not (default is allow). This should help you. I am curious about the performance you will see.

If I understand what you did correctly, isn't this what he tried that caused the library to fail?  He turned off all the calls which caused it to fail because something inside JAGaToo was passed into a graphics API that was supposed to be inside graphics memory.

One this seems strange to me. If direct buffers are not handled very carefully by the garbage collector, there should be a method to manually deallocate these guys. But there doesn't seem to be one.

Yup, there has to be one somewhere.  What would be really horrible is if the libraries on which we depend (lwjgl / jogl) failed to implement the call so we can free the buffer.
Logged

Kevin
"It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k."
http://stackoverflow.com/users/3474/sylvarking
horati
Global Moderator
Getting respectable
*****
Offline Offline

Posts: 393


View Profile
« Reply #8 on: 27. May 2010, 03:50:44 am »

Doing a bit of research into the problem, we are basically in the same spot as JBoss; i.e., we need some aligned memory buffers and some speed.  It appears that our direct use of ByteBuffer.allocateDirect() just calls malloc() which is why RWB ran into a crash when he used allocate() which uses Java's normal randomly aligned memory -- not as I had thought that we were allocating memory inside video RAM which probably does have appropriate free calls available.

RWB, you seem most affected by this problem.  Would you mind replacing the call to allocateDirect() in your situation with JNI calls to malloc() and posix_mem_align()?  These 2 tests would narrow down the problem.  I suspect that we should be calling posix_mem_align() and free() via JNI instead of calling allocateDirect() at all.  Malloc() (and by extension allocateDirect) is probably word-aligned on your platform, but I believe this call should be posix_mem_align() just to be safe.

FYI, you will want to wrap the byte[] returned via JNI using http://java.sun.com/j2se/1.5.0/docs/api/java/nio/ByteBuffer.html#wrap(byte[) ([] but the bbcode of the forum had problems) so that you have to write the least code.  Hopefully, it is less than 3 lines of changes inside org/jagatoo/util/nio/BufferUtils.java

Based on Marvin's post, I believe he already has some idea where we should place the calls to free().  If so, your tests will verify that we are on the right track on the allocation side and Marvin can add the calls to free() when he gets some time.

If all this works out, we can just wrap it inside JAGaToo's BufferUtils class and not have to worry about it again.


For more information about JBoss's bad experience using allocateDirect, interested parties can have a look at http://community.jboss.org/thread/82231.  I have not gone into the source of allocateDirect() to verify the assertion made in that thread that states
Quote
Jason told me on a pvt email that there is a Sleep(100ms) and a System.gc call for every direct buffer you create. wow!
meaning that each call to allocateDirect() triggers a sleep(100).
« Last Edit: 27. May 2010, 04:01:43 am by horati » Logged

Kevin
"It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k."
http://stackoverflow.com/users/3474/sylvarking
Marvin Fröhlich
Xith Lord
Administrator
Guru
*****
Offline Offline

Posts: 4381


May the 4th, be with you...


View Profile
« Reply #9 on: 27. May 2010, 03:24:31 pm »

If I understand what you did correctly, isn't this what he tried that caused the library to fail?  He turned off all the calls which caused it to fail because something inside JAGaToo was passed into a graphics API that was supposed to be inside graphics memory.

Hmm... maybe I got something wrong. Anyway this switch shouldn't hurt Wink.

Based on Marvin's post, I believe he already has some idea where we should place the calls to free().

I thought about the finalize() methods of the encapsulating classes. Shouldn't this work?

To use the posix_mem_align() methods we'll have to create native libraries for every supported platform, don't we? Hmm... this sounds like we should do this in cooperation with other projects, since we might not be the only ones, who need this.

If we have to do this, we should also consider adding some other utility methods, that speed up performace by using native code. I think about methods like Arrays.fill(), which don't use native code and could be boosted this way. But this is just a side note and should be topic of another thread when the time has come.

Marvin
Logged
horati
Global Moderator
Getting respectable
*****
Offline Offline

Posts: 393


View Profile
« Reply #10 on: 27. May 2010, 05:20:20 pm »

Hmm... maybe I got something wrong. Anyway this switch shouldn't hurt Wink.

Shouldn't hurt, he already tried the equivalent.  Using false globally caused abnormal termination when native code eventually tried to use an unaligned matrix created in Java code.

I thought about the finalize() methods of the encapsulating classes. Shouldn't this work?

It should work 95% of the time.  There are several situations where finalization cannot be guaranteed, especially when the JVM terminates.  As long as we limit the approach to the use of regular memory buffers, it should be fine.  Normal memory buffers are all eliminated when the JVM terminates so these would be covered in 100% of the cases.  If we used this approach for native resources like chunks of video RAM, then it could cause the user to need to reboot after some number of JVM terminations.

To use the posix_mem_align() methods we'll have to create native libraries for every supported platform, don't we? Hmm... this sounds like we should do this in cooperation with other projects, since we might not be the only ones, who need this.

If we have to do this, we should also consider adding some other utility methods, that speed up performace by using native code. I think about methods like Arrays.fill(), which don't use native code and could be boosted this way. But this is just a side note and should be topic of another thread when the time has come.

I agree on all accounts.
Logged

Kevin
"It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k."
http://stackoverflow.com/users/3474/sylvarking
Marvin Fröhlich
Xith Lord
Administrator
Guru
*****
Offline Offline

Posts: 4381


May the 4th, be with you...


View Profile
« Reply #11 on: 27. May 2010, 11:24:35 pm »

It should work 95% of the time.  There are several situations where finalization cannot be guaranteed, especially when the JVM terminates.  As long as we limit the approach to the use of regular memory buffers, it should be fine.  Normal memory buffers are all eliminated when the JVM terminates so these would be covered in 100% of the cases.  If we used this approach for native resources like chunks of video RAM, then it could cause the user to need to reboot after some number of JVM terminations.

I don't think, we ever actually get access to video RAM. This is actually one thing, that I would like to see improved in LWJGL and other OpenGL bindings. If I update a texture, I write data to a buffer and push this buffer to the OpenGL binding, which copies the data to VRAM. Or if I read texture data, I create a native buffer and ask the OpenGL binding to copy the data into this buffer. Sometimes it should be a lot faster, if I could just read/write to a buffer, that directly maps to VRAM.

Apart from that the operating system should free all allocated memory when the process terminates.

Marvin
Logged
horati
Global Moderator
Getting respectable
*****
Offline Offline

Posts: 393


View Profile
« Reply #12 on: 28. May 2010, 01:56:46 am »

I don't think, we ever actually get access to video RAM. This is actually one thing, that I would like to see improved in LWJGL and other OpenGL bindings. If I update a texture, I write data to a buffer and push this buffer to the OpenGL binding, which copies the data to VRAM. Or if I read texture data, I create a native buffer and ask the OpenGL binding to copy the data into this buffer. Sometimes it should be a lot faster, if I could just read/write to a buffer, that directly maps to VRAM.

That sucks.  Do the calls exist in the underlying OpenGL API?  If so, maybe the utilities we want to write belong in LWJGL so that is where we should contribute.

Apart from that the operating system should free all allocated memory when the process terminates.

But not memory allocated through privileged calls.  Only the things allocated through unprivileged calls such as malloc, calloc, posix_mem_align, direct calls to _alloc_pages, etc.  I don't mean to beat a dead horse; I just want to bring up a reminder that process termination is not an absolute guarantee that resources are freed.
Logged

Kevin
"It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k."
http://stackoverflow.com/users/3474/sylvarking
Marvin Fröhlich
Xith Lord
Administrator
Guru
*****
Offline Offline

Posts: 4381


May the 4th, be with you...


View Profile
« Reply #13 on: 28. May 2010, 02:43:12 am »

That sucks.  Do the calls exist in the underlying OpenGL API?  If so, maybe the utilities we want to write belong in LWJGL so that is where we should contribute.

I would have to look that up. I dealt with Direct3D (overlay drawing) quite a lot in recent time. And there I access texture data by directly... hmmm... wait a second. There the data has to be copied, too. Direct3D does that for you, though in a pretty dumb way. It copies way too much. But that's a different story. Maybe we're not too sadly equipped.

But not memory allocated through privileged calls.  Only the things allocated through unprivileged calls such as malloc, calloc, posix_mem_align, direct calls to _alloc_pages, etc.  I don't mean to beat a dead horse; I just want to bring up a reminder that process termination is not an absolute guarantee that resources are freed.

I see.

Marvin
Logged
Pages: [1]
Print
Jump to:  

Theme orange-lt created by panic