After a lot of heavy use of the remote actors libraries in Scala, I noticed that something seemed to be leaking memory. Unable to see anything obvious which might have been leaking, I fired up a profiler to investigate.
The short version of this post is as follows:
Most of the memory leaks I found are contained within the Remote Actors library classes, and are cleared up in the release candidates for Scala 2.8. If you can upgrade to 2.8 now then your Remote Actors will leak less, though the final leak mentioned in this post may still catch you out.
If, like me, you have lots of Scala code dependent on the 2.7.7 collections classes and haven’t gotten around to refactoring for 2.8 but use remote actors heavily, then you’ll be pleased to hear that the 2.8 remote actor code can be used with the rest of the 2.7.7 codebase with few modifications.
I’m happy to provide a patched-up version of 2.7.7-final for you here. Extract this to where you’d normally extract your Scala libs, and link into it as usual.
The remainder of this post covers the three distinct, but related, memory leaks I found. At least the first and last are resolved by the patched code above.
Anatomy of the leak in 2.7.7
I traced the leak back to Proxy.scala. These Proxy objects basically acts as a go-between for incoming and outgoing messages on the Remote Actor network stack. A Proxy represents an Actor located on a remote host, and handles handles four main message-passing cases:
HostA: New outgoing message.
HostB: New incoming message
HostB: Outgoing response
HostA: Incoming response
In 2.8, Proxy.scala maintains and modifies two data structures (a ‘channelMap’ and a ‘sessionMap’) which are modified during each of these four use-cases for only synchronous messages and futures messages. In 2.7.7-final, these structures are modified for all messages.
Here’s a high-level example of how these structures are modified: For this example I’m using two machines, ‘lifou’ which sends messages, 192.168.1.5, and ‘ikeq’, 192.168.1.3, which receives and replies to messages. The sequence of events may be as follows:
lifou calls select(), which creates a new Proxy object to represent ikeq (e.g., Sink@Node(ikeq,51236)).
ikeq: A new Proxy object is created at ikeq to represent lifou (creator: remotesender0@Node(192.168.1.5,51234)).
lifou: Sends a message to ikeq, which passes through the Proxy created in step 1. A Pair is added to this Proxy’s channelMap.
ikeq: Receives message via Proxy created in step 2, and a pair is added to it’s sessionMap.
ikeq: Responds, via the Proxy created in step 2. The Proxy correctly removes the appropriate entry from the sessionMap.
lifou: Receives message via Proxy created in step 1. The Proxy correctly removes the appropriate entry from the channelMap.
So while it’s very reasonable to accept that synchronous and futures messages will receive a response which will clear existing state, this is not such an easy assumption with asynchronous messages. Without a matching response, the state retained by 2.7.7 will never be cleared for many asychronous messages. Indeed, lots of my message passing is not symmetric, in terms of symmetry of messages and responses, and thus my code made Proxy.scala leak badly.
Fortunately, the 2.8 Remote Actors code behaves pretty well. So if you’re stuck on the 2.7.7 collections classes for now, but don’t want your remote actors to leak memory, then consider using the patched version above.
Outline of one remaining leak in 2.8
On sending a, say, synchronous message, the Proxy creates a new, unique, Symbol for that transaction to put into the Maps at either side. All Symbols are interned such that duplicate Symbols actually become pointers to the same backing data, reducing wastage. These Symbols are stored in a WeakHashMap, which suggests that once the Symbol is no longer reachable via any other reference chain than via this WeakHashMap then it should be collected by the garbage collector. It seems, though, that interned Symbols are not collected very quickly, and do appear to constitute a leak.
If you run a long-lived process which uses many synchronous messages, you may run into this issue. I’m not aware of any fix to this, yet.
Outline of another remaining leak in 2.8
The Remote Actors library in 2.7.7 and 2.8 currently doesn’t match IP addresses to hostnames to identify an appropriate Proxy to handle a message. If you follow the 6-step sequence of events above when using hostnames, step 6 should be replaced by the following:
That is, a third Proxy object is created, because the library doesn’t understand that the hostname and the IP address are equivalent. Subsequent message exchanges continue to do the following:
Outbound messages at lifou use the first proxy created in step 1.
Inbound messages at lifou use the third proxy created in step 6.
Put differently: this state accumulates because one Proxy handles outgoing messages, one handles incoming messages. The memory leak only occurs at lifou in the example, not ikeq. In 2.8, asynchronous message passing will still create multiple Proxies, but the channelMap/sessionMap state will not be modified.
I have created a bug report with a minor patch which resolves this issue; the patch is included in the version of Scala available above.
Alternatively, if you do not with to change your Scala runtime, the problem is really easy to work around: If the developer remembers to resolve IP addresses, then there is no problem. But they need to remember to do it every time they might see a hostname to ensure no leaking.
Summary: What do we learn?
I found the first of these bugs only because my work involves fairly heavy use of Scala and Remote Actors. The others I discovered when rooting around to figure out where the problem was. The memory leak in the 2.7.7 case with asynchronous message passing simply “goes away” by using 2.8 Remote Actors, either in the upcoming 2.8 final release, or by grabbing the build above.
The two other problems, however, do not go away quite so easily. Of these, the latter is easily worked around, but I don’t know of a solution to the former. These aren’t showstoppers, but it’s good to be aware of how the libraries you’re using actually work.
It’s been really satisfying to be able to dig around an implementation of a language and contribute something back, even if in just a small way. Please try out the build above if it suits your requirements, and let me know if it improves your work.