ismm logo

The 2009 International Symposium on Memory Management

Trinity College Dublin
Dublin, Ireland
June 19-20, 2009

acm logo
    Sponsored by
ACM SIGPLAN

Program

ISMM is a forum for research in management of dynamically allocated memory. Areas of interest include, but are not limited to: explicit storage allocation and de-allocation; garbage collection algorithms and implementations; compiler analyses to aid memory management; interactions with languages, operating systems, and hardware, especially the memory system; memory management related tools; and empirical studies of allocation and referencing behavior in programs that make significant use of dynamic memory. ISMM solicits full-length papers on all areas of memory management.


Friday, June 19, 2009

08:30-08:40 Welcome

08:40-10:10 Paper Session 1, Session Chair: Xipeng Shen

Modeling, Analysis and Throughput Optimization of a Generational Garbage Collector
David Vengerov
A New Approach to Parallelising Tracing Algorithms
Cosmin E. Oancea, Alan Mycroft, and Stephen Watt
Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems
Kim Hazelwood, Greg Lueck, and Robert Cohn

10:10-10.30 Break

10:30-12:00 Paper Session 2, Session Chair: Dan Grossman

Garbage Collection in the Next C++ Standard
Hans-J. Boehm and Mike Spertus
Precise Garbage Collection for C
Jon Rafkind, Adam Wick, Matthew Flatt, and John Regehr
Self-Recovery in Server Programs
Vijay Nagarajan, Dennis Jeffrey and Rajiv Gupta

12:00-01:30 Lunch

01:30-03:00 Invited Speaker
Memory Management Issues in Non-Blocking Synchronization
Maged Michael

03:00-03:30 Break

03:30-05:00 Paper session 3, Session Chair: Bjarne Steensgaard

Investigating the Effects of Using Different Nursery Sizing Policies
on Performance

Xiaohua Guan, Witawas Srisa-an, and ChengHuan Jia
Placement Optimization Using Data Context Collected During Garbage Collection
Mauricio Serrano and Xiaotong Zhuang
Efficient Alias Set Analysis Using SSA Form
Nomair Naeem and Ondrej Lhotak


Saturday, June 20, 2009

08:30-10:00 Paper session 4, Session Chair: Kenjiro Taura

Identification of Logically Related Heap Regions
Mark Marron, Deepak Kapur, and Manuel Hermenegildo
A Qualitative Model of Spatial Locality
Ian Christopher, Tongxin Bai, Xiaoming Gu, Chengliang Zhang, and Chen Ding
Fast Allocation Speed, Low Memory Fragmentation, and High Spatial Locality: Pick Two
Alin Jula and Lawrence Rauchwerger

10:00-10:30 Break

10:30-12:00 Wild and Crazy Ideas, Session Chair: Tony Hosking

This session is a fun and informal forum for discussion of interesting ideas in the area of memory management. Presenters are given just 4 minutes in which to present their idea and 1 minute to take questions. Prizes are awarded for the idea most crazy and the one most worthy of implementation! Please contribute your wild and crazy ideas.
Please contact the WACI chair, Tony Hosking, before the event.

12:00-01:30 Lunch

01:30-03:00 Paper session 5, Session Chair: Pramod Joisha

Dynamic Shape Analysis via Degree Metrics
Maria Jump and Kathryn McKinley
Live Heap Space Analysis for Languages with Garbage Collection
Elvira Albert, Samir Genaim, and Miguel Gómez-Zamalloa
Parametric Heap Usage Analysis for Functional Programs
Leena Unnikrishnan and Scott Stoller

03:00-03:30 Break

03:30-04:00 Wrap-up

Abstracts

  • Modeling, Analysis and Throughput Optimization of a Generational Garbage Collector
    David Vengerov
    One of the garbage collectors in Sun's HotSpot Java(TM) Virtual Machine is known as the generational throughput collector, as it was designed to have a large throughput (fraction of time spent on application's work rather than on garbage collection). This paper derives an analytical expression for the throughput of this collector in terms of the following key parameters: the sizes of the “Young” and “Old” memory spaces and the value of the tenuring threshold. Based on the derived throughput model, a practical algorithm ThruMax is proposed for tuning the collector's parameters so as to formally maximize its throughput. This algorithm was implemented as an optional feature in an early release of JDK(TM) 7, and its performance was evaluated for various settings of the SPECjbb2005 workload. A consistent improvement in throughput was demonstrated when the ThruMax algorithm was enabled in JDK 7.
    [pdf]

  • A New Approach to Parallelising Tracing Algorithms
    Cosmin E. Oancea, Alan Mycroft, and Stephen Watt
    Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshalling etc. Traditional sequential algorithms use a worklist; visiting a node takes an item from the worklist and adds its unvisited children. Previous work on parallel tracing is processor-oriented in associating one worklist per processor: worklist insertion and removal requires no locking, and load balancing requires only occasional locking. However, since multiple queues may contain the same node, significant locking is necessary to avoid concurrent visits by competing processors.
    This paper proposes a memory-oriented solution: memory is partitioned into segments and each segment has its own worklist containing only nodes in that segment. At a given time at most one processor owns a given worklist. By arranging separate (single-reader single-writer) forwarding queues to pass nodes from processor i to processor j we can process objects in an order that gives lock-free mainline code and improved locality of reference. This refactoring is analogous to the way in which a compiler changes an iteration space to eliminate data dependencies.
    It is clear that this latter solution would be more effective on NUMA systems (and even necessary when processor-local memory may not be addressed from other processors); however, slightly surprisingly, it often gives significantly better speed-up on modern multi-cores architectures too---using caches to hide memory latency loses much of its effectiveness when there is significant cross-processor memory contention.
    We obtain speedup as high as five and six times faster than the optimal (synchronisation-free) sequential timing on six and eight processors, respectively.
    [ppt]

  • Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems
    Kim Hazelwood, Greg Lueck, and Robert Cohn
    Dynamic binary instrumentation systems are used to inject or modify arbitrary instructions in existing binary applications, and several such systems have been developed over the past decade. Yet, much of the literature describing the internal architecture and performance of these systems has focused on executing single-threaded guest applications. In this paper, we discuss the specific design decisions necessary for supporting large, multithreaded applications on JIT-based dynamic instrumentation systems. While providing a working solution to multithreading is straightforward, providing a solution that scales in terms of memory and performance is much more intricate. We highlight the design decisions implemented in the latest version of the Pin dynamic instrumentation system, including the just-in-time compiler, the emulator, and the code cache. The overall design strives to provide scalable performance and memory footprints on modern applications.
    [ppt]

  • Garbage Collection in the Next C++ Standard
    Hans-J. Boehm and Mike Spertus
    C++ has traditionally relied on manual memory management. Sometimes this has been augmented by limited reference counting, implemented in libraries, and requiring use of separate pointer types. In spite of the fact that conservative garbage collectors have been used with C for decades, and with C++ for almost as long, they have not been well-supported by language standards. This in turn has limited their use.
    We have led an effort to change this by supporting optional "transparent'' garbage collection in the next C++ standard. This is designed to either garbage collect or detect leaks in code using normal unadorned C++ pointers.
    We initially describe an ambitious effort that would have allowed programmers to explicitly request garbage collection. It faced a number of challenges, primarily in correct interaction with existing libraries relying on explicit destructor invocation. This effort was eventually postponed to the next round of standardization.
    This initial effort was then temporarily replaced by minimal support in the language that officially allows garbage collected implementations. Such minimal support is included in the current committee draft for the next C++ standard. It imposes an additional language restriction that makes it safe to garbage collect C++ programs. Stating this restriction proved subtle.
    We also provide narrow interfaces that make it easy to both correct code violating this new restriction, and to provide hints to a conservative garbage collector to improve its performance. These provide interesting implementation challenges. We discuss partial solutions.
    [ppt]

  • Precise Garbage Collection for C
    Jon Rafkind, Adam Wick, Matthew Flatt, and John Regehr
    Magpie is a source-to-source transformation for C programs that enables precise garbage collection, where precise means that integers are not confused with pointers, and the liveness of a pointer is apparent at the source level. Precise GC is mainly useful for long-running programs and programs that interact with untrusted components. In particular, we have successfully deployed precise GC in the C implementation of a language run-time system that was originally designed to use conservative GC. We also report on our experience in transforming parts of the Linux kernel to use precise GC instead of manual memory management.
    [pdf]

  • Self-Recovery in Server Programs
    Vijay Nagarajan, Dennis Jeffrey and Rajiv Gupta
    It is important that long running server programs retain availability amidst software failures. However, servers programs do fail and one of the important causes of failures in server programs is due to memory errors. Software bugs in the server code like buffer overflows, integer overflows, etc. are exposed by certain user requests, leading to memory corruption, which can often result in crashes. One safe way of recovering from these crashes is to periodically checkpoint program state and rollback to the most recent checkpoint on a crash. However, checkpointing program state periodically can be quite expensive. Furthermore, since recovery can involve the rolling back of considerable state information in addition to replay of several benign user requests, the throughput and response time of the server can be reduced significantly during rollback recovery. In this paper, we first conducted a detailed study to see how memory corruption propagates in server programs. Our study shows that memory locations that are corrupted during the processing of an user request, generally do not propagate across user requests. On the contrary, the memory locations that are corrupted are generally cleansed automatically, as memory (stack or the heap) gets deallocated or when memory gets overwritten with uncorrupted values. This self cleansing property in server programs, led us to believe that recovering from crashes does not necessarily require the expensive roll back of state for recovery. Motivated by this observation, we propose SRS, a technique for self recovery in server programs which takes advantage of self-cleansing to recover from crashes. Those memory locations that are not fully cleansed are restored in a demand driven fashion, which makes SRS very efficient. Thus in SRS, when a crash occurs instead of rolling back to a safe state, the crash is suppressed and the program is made to execute forwards past the crash; we employ a mechanism called crash suppression, to prevent further crashes from recurring as the execution proceeds forwards. Experiments conducted on real world server programs with real bugs, show that in each of the cases the server program could efficiently recover from the crash and the faulty user request was isolated from future benign user requests.
    [ppt]

  • Memory Management Issues in Non-Blocking Synchronization
    Maged Michael
    Non-blocking synchronization offers high availability and strong progress guarantees. Non-blocking implementations of shared data types are guaranteed to be available, even under worst-case thread scheduling decisions and the possibility of threads crashing at arbitrary points. The inherent characteristics of non-blocking algorithms pose intricate memory management problems. These problems include dynamic memory allocation and deallocation, the determination of the safety of reclaiming dynamic memory blocks, and bounding on memory use of non-blocking algorithms. This talk presents an introduction to non-blocking synchronization; and a discussion of the memory management problems that arise in non-blocking algorithms, known solutions and their trade-offs and limitations, open problems, and the interaction between automatic garbage collection and non-blocking synchronization.
    [pps]

  • Investigating the Effects of Using Different Nursery Sizing Policies on Performance
    Xiaohua Guan, Witawas Srisa-an, and ChengHuan Jia
    In this paper, we investigate the effects of using different nursery sizing policies on garbage collection performance and overall performance. As part of our investigation, we modify the parallel generational collector in HotSpot to support a fixed ratio policy and heap availability policy (similar to that used in Appel collectors). We then investigate their effects on twenty large and small multithreaded Java benchmarks; each operates in a reasonably sized heap. The result of our investigation indicates that most of the benchmarks are sensitive to different heap sizing policies, resulting in overall performance differences that can range from 1 percent to 36 percents. We also find that in our server application benchmark, more than one policy may be needed. As a preliminary study, we introduce a hybrid policy that uses one policy when the heap space is plentiful to yield optimal performance and then switches to a different policy to yield more graceful performance degradation under heavy memory pressure.
    [pptx]

  • Placement Optimization Using Data Context Collected During Garbage Collection
    Mauricio Serrano and Xiaotong Zhuang
    We present a study on data context for object-oriented programs. We first introduce several data structures related to data context that can properly organize object fields, object types and the access sequence in a compact manner. To make the overhead of getting data context negligible, our approach combines the collection of data context with commonly used garbage collectors in a virtual machine environment. The garbage collector maintains extra runtime data for the building of data contexts. To save memory space and also the time spent on retrieving data, a shorter representation is proposed which sacrifices a small amount of accuracy. To further demonstrate the usefulness of data context for dynamic optimizations, we implemented a placement optimization that captures frequently missed data accesses and places relevant objects to reduce data cache misses and improve performance. The proposed scheme was extensively experimented with several benchmarks and in various experimental setups. We demonstrate that building the data context for program understanding and optimization is valuable in a virtual machine environment by using a garbage collector, while reducing the extra overhead to less than 1%.
    [ppt]

  • Efficient Alias Set Analysis Using SSA Form
    Nomair Naeem and Ondrej Lhotak
    Precise, flow-sensitive analyses of pointer relationships often represent each object using the set of local variables that point to it (the alias set), possibly augmented with additional predicates. Many such analyses are difficult to scale due to the size of the abstraction and due to flow sensitivity. The focus of this paper is on efficient representation and manipulation of the alias set. Taking advantage of certain properties of static single assignment (SSA) form, we propose an efficient data structure that allows much of the representations of sets at different points in the program to be shared. The transfer function for each statement, instead of creating an updated set, makes only local changes to the existing data structure representing the set. The key enabling properties of SSA form are that every point at which a variable is live is dominated by its definition, and that the definitions of any set of simultaneously live variables are totally ordered according to the dominance relation. We represent the variables pointing to an object using a list ordered consistently with the dominance relation. Thus, when a variable is newly defined to point to the object, it need only be added to the head of the list. A back edge at which some variables cease to be live requires only dropping variables from the head of the list. We prove that the analysis using the proposed data structure computes the same result as a set-based analysis. We empirically show that the proposed data structure is more efficient in both time and memory requirements than set implementations using hash tables and balanced trees.

  • Identification of Logically Related Heap Regions
    Mark Marron, Deepak Kapur, and Manuel Hermenegildo
    This paper introduces a novel set of heuristics for identifying logically related sections of the heap such as recursive data structures, objects that are part of the same multi-component structure, and related groups of objects stored in the same collection/array. When combined with lifetime properties of these structures, this information can be used to drive a range of program optimizations including pool allocation, object co-location, static deallocation, and region-based garbage collection. The technique outlined in this paper also improves the efficiency of the static analysis by providing a compact normal form for the abstract models (speeding the convergence of the static analysis).
    We focus on two techniques for grouping parts of the heap. The first is a technique for precisely identifying recursive data structures in object-oriented programs based on the types declared in the program. The second technique is a novel method for grouping objects that make up the same composite structure and that allows us to partition the objects stored in a collection/array into groups based on a similarity relation. We provide a parametric component in the similarity relation in order to support specific analysis applications (such as a numeric analysis which would need to partition the objects based on numeric properties of the fields). Using the Em3d and Barnes-Hut benchmarks from the JOlden suite we show how these grouping methods can be used to identify various types of logical structures and enable the application of many region-based program optimizations.
    [pptx]

  • A Qualitative Model of Spatial Locality
    Ian Christopher, Tongxin Bai, Xiaoming Gu, Chengliang Zhang, and Chen Ding
    Good spatial locality alleviates both the latency and bandwidth problem of memory access by boosting the effect of prefetching and improving the utilization of cache. The conventional definition is often based on miss rates, while recent results have shown that changing data layout can improve performance without reducing the miss rate. In addition, the overall miss rate does not indicate whether there are separate components of spatial locality and how much can each component and the overall locality be improved.
    In this paper we present the design and the evaluation of a component model of spatial locality. The new definition is based on the change of reuse distance as the granularity of data increases. It divides run-time memory accesses into components by grouping them based on the length of their reuse distance. Finally, it reduces the analysis overhead using training analysis on small inputs and sampling analysis based on an existing tool. We have implemented the system as a programming tool and will report the spatial locality of a set of large benchmarks. In a user study, the tool helped to locate a data-layout issue and improve the performance by 7% with a 6-line change in an application with over 2,000 lines.
    [pdf]

  • Fast Allocation Speed, Low Memory Fragmentation, and High Spatial Locality: Pick Two
    Alin Jula and Lawrence Rauchwerger
    Dynamic memory allocators affect an application's performance through their data layout quality. They can use an application's allocation hints to improve the spatial locality of this layout. However, a practical approach needs to be automatic, without user intervention. In this paper we present two locality improving allocators, that use allocation hints provided automatically from the C++ STL library to improve an application's spatial locality. When compared to state-of-the-art allocators on seven real world applications, our allocators run on average 7% faster than the Lea allocator, and 17% faster than the FreeBSD's allocator, with the same memory fragmentation as the Lea allocator, one of the best allocators.
    While considering locality as an important goal, locality improving allocators must not abandon the existing constraints of fast allocation speed and low fragmentation. These constraints challenge their design and implementation. We experimentally show that within a memory allocator, allocation speed, memory fragmentation, and spatial locality compete with each other in a game of rock, paper, scissors: when one tries to improve one trait, the others suffer. We conclude that our allocators achieve a good balance of these traits, and they can easily be adjusted to optimize the most important trait for each application.
    [pdf]

  • Dynamic Shape Analysis via Degree Metrics
    Maria Jump and Kathryn McKinley
    Applications continue to increase in size and complexity making debugging and program understanding more challenging. Programs written in managed languages, such as Java, C#, and Ruby, further exacerbate this challenge because they tend to encode much of their state in the heap. This paper introduces dynamic shape analysis which seeks to characterize data structures used by programs by dynamically summarizing the object pointer relationships in the heap and detecting dynamic invariants based on class. The analysis identifies recursive data structures, automatically discovers dynamic heap invariants, and reports errors when invariants are violated. Uses of dynamic shape analysis include helping programmers find data structure errors during development, generating assertions for verification with static or dynamic analysis, and detecting subtle errors in deployment. We implement dynamic shape analysis in ShapeUp. Using SPECjvm and DaCapo benchmarks, we show that most objects in the heap are part of recursive data structures that maintain strong dynamic invariants. We show that once dynamic shape analysis establishes invariants from correct executions, it can find automatically inserted errors on subsequent executions in microbenchmarks. This suggests it can be used in deployment for improving software reliability. This paper illustrates that heap summarization of heap-allocated objects is useful for program understanding and debugging.
    [ppt]

  • Live Heap Space Analysis for Languages with Garbage Collection
    Elvira Albert, Samir Genaim, and Miguel Gómez-Zamalloa
    The peak heap consumption of a program is the maximum size of the live data on the heap during the execution of the program, i.e., the minimum amount of heap space needed to run the program without exhausting the memory. It is well-known that garbage collection (GC) makes the problem of predicting the memory required to run a program difficult. This paper presents, the best of our knowledge, the first live heap space analysis for garbage-collected languages which infers accurate upper bounds on the peak heap usage of a program's execution that are not restricted to any complexity class, i.e., we can infer exponential, logarithmic, polynomial, etc., bounds. Our analysis is developed for an object-oriented bytecode language with a scoped-memory manager that reclaims memory when methods return. We also show how our analysis can accommodate other GC schemes which are closer to the ideal GC which collects objects as soon as they become unreachable. The practicality of our approach is experimentally evaluated on a prototype implementation. We demonstrate that it is fully automatic, reasonably accurate and efficient by inferring live heap space bounds for a standardized set of benchmarks, the JOlden suite.
    [pdf]

  • Parametric Heap Usage Analysis for Functional Programs
    Leena Unnikrishnan and Scott Stoller
    This paper presents an analysis that derives a formula describing the worst-case live heap space usage of programs in a functional language with automated memory management (garbage collection). First, the given program is automatically transformed into {\it bound functions} that describe upper bounds on the live heap space usage and other related space metrics in terms of the sizes of function arguments. The bound functions are simplified and rewritten to obtain recurrences, which are then solved to obtain the desired formulas characterizing the worst-case space usage. These recurrences may be difficult to solve due to uses of the {\it maximum} operator. We give methods to automatically solve categories of such recurrences. Our analysis determines and exploits monotonicity and monotonicity-like properties of bound functions to derive upper bounds on heap usage, without considering behaviors of the program that cannot lead to maximal space usage.
    [ppt]


Wild and Crazy Ideas
This session was a fun and informal forum for discussion of interesting ideas in the area of memory management.

  • A Non Readable Heap, Laurence Hellyer
    [pptx]
  • REFERENCE COUNTING: Tired Old Technique, or Part of the Multi-Core Future?, Kevin Hoffman
    [pdf]
  • The meaning of “volatile”, Hans Boehm
    [ppt]
  • Registers and Threads, M. Anton Ertl
    [pdf]
  • OCaml for MultiCore, Philippe Wang
    [pdf]
  • Non-tracing, non-conservative, non-blocking, scalable, locality-perserving, but probabilistic, garbage collector, Kenjiro Taura
    [pdf]
  • Duplication, Guy Steele
    [pdf]

Supporters ISMM is grateful for the generous contributions from its supporters.
Interested in becoming an ISMM'09 supporter? Please contact Hillel Kolodner.

Sponsored by Microsoft Research Sponsored by IBM

Copyright of Wikipedia's user Jasonm Licensed under the Creative Commons ShareAlike 2.5 License