1
Vote

Garbage collection is not cleaning all unused nodes

description

The garbage collection is not cleaning all unused nodes. This will lead to growing databases when there will be continuously written to the database. At reopening the database the unused nodes will be cleaned by the method OptimizeData, but in long running applications this is unacceptable, and should be cleaned by the garbage collection (method CleanUp).

It seems to me that there are two approaches of garbage collection: one at opening the database and one at the cleanup. Could the garbage collection approach not merged to one approach?

file attachments

comments

branislavrus wrote Mar 21, 2013 at 1:09 PM

Hi Mark,

I think that the first things is to identify if the problem is indeed in the IOG library or is the problem in the project which uses the IOG library. I have seen a couple of issues earlier which looked like the issues in IOG and which were later identified as other project issues.

I have seen the changes that you have made and log created. Unfortunately this does not give me enough information to identity the problem and fix ti.

The best way to identify the problem is to create a unit test which exposes the problem. If you can create that unit test I can see what the problem is and fix it.

Also you should send me some kind of scenario to reproduce the problem. But scenario itself requires more time from our side which must be scheduled.

I hope that this was helpful.

Branislav

mpigge wrote Mar 22, 2013 at 7:43 AM

Hi Branislav,

I found the source of the problem. I added the following code to the GC method Cleanup of Context class:
            List<Guid> snapshots = snapshotsService.ListSnapshots();
            for (int index = 0; index < snapshots.Count; index++)
            {
                if (usedSnapshots.Contains(snapshots[index]))
                {
                    Console.WriteLine("Tsnapshot[" + snapshots[index] + "]=" + index);
                }
            }
            Console.WriteLine("Number of snapshots = " + snapshots.Count);
Hereby I can monitor where the snapshots are in time. In my case the number of snapshots was growing, because …

Consider that you have two threads:
• First thread has a BlockingQueue and is only performing actions which are added to the BlockingQueue. If there is an action in the BlockingQueue then UpdateWorkspace will be executed and new data will be committed to the database.
• Second thread is continuously committing data to the database.

I have a situation whereby the first thread during the lifetime of the application has no actions to perform. So the first thread is at snapshot T = 0, and the second thread is continuously updating and after each update the snapshot is T = T + 1. This explains why the number of snapshots is growing. Threads with their workspace data can be on the timeline far apart from each other, and the data of the complete timeline from those threads needs to be stored in database in order of doing commits. I assume that you cannot garbage collect the intermediate snapshots, because on commit you have merge actions to do.

I am not sure if you could solve this problem within the IOG, but somehow this design constraint must be taken into consideration while developing applications upon the IOG. This is a form of memory leakage to the database, and may be you could add some facilities to detect this kind of problems.

To avoid this I need to wake up on interval base the first thread and update the workspace, otherwise the garbage collector will be not able to clean up the old snapshots.

Best regards,

Mark

mpigge wrote Mar 22, 2013 at 7:44 AM

Hi Mark,

You are correct with everything you have written.
In case when there are two threads and one of those threads keeps the old workspace snapshot blocked with a certain snapshot all snapshots in between are kept for commit purposes (as you described).
This is the way the IOG works. If you change anything on an old snapshot and commit changes IOG knows how to merge those changes into latest snapshot.
If the nodes from intermediate snapshots are removed it would be impossible to implement the merge functionality.

In my opinion this is not a memory leak. As soon as the old snapshost is not used anymore garbage collection will clean all the unused nodes. As you are aware, we can easily detect this situation, but if we want to preserve the commit functionality we cannot dispose the nodes which can be used in the future.

There are two ways you can fix this situation.
The one you described, with periodic workspace update all of the threads will use the latest snaphost and everything will be ok.
The other solution is more simple. Don't keep the open workspace while the thread is blocked. Open the workspace when that thread unblocks.

Branislav