Finalization
In object-oriented programming, finalization is the process of preparing an object for deallocation; strictly speaking, finalization is all parts of object destruction until memory deallocation itself. Finalization is formally complementary to initialization, which is the part of object creation that follows allocation, but differs significantly in practice – see contrast with initialization. Finalization fulfills a similar role as finally in exception handling; in general these are unrelated, but in some cases the behavior is identical, and the case of finally in a coroutine can be considered a form of finalization – see connection with finally. The term "final" is also used to indicate a class that cannot be inherited; this is unrelated.
Finalization varies significantly between languages and between implementations of a language, depending on memory management method, and can generally be partially controlled per-object or per-class by a user-specified finalizer or destructor, unlike deallocation. The terms "finalization" and "finalizer" are primarily used in languages with garbage collection, especially with non-deterministic object lifetimes, like Java; while "destruction" and "destructor" are used for languages with deterministic object lifetimes, like C++. This article addresses finalization in the broad sense, regardless of object lifetime or memory management method, but distinguishes how finalization differs depending on these.
Use
Finalization is primarily used for cleanup, to release memory or other resources: to deallocate memory allocated via manual memory management; to clear references if reference counting is used (decrement reference counts); to release resources, particularly in the Resource Acquisition Is Initialization (RAII) idiom; or to unregister an object. The amount of finalization varies significantly between languages, from extensive finalization in C++, which has manual memory management, reference counting, and deterministic object lifetimes; to often no finalization in Java, which has non-deterministic object lifetimes and is often implemented with a tracing garbage collector. It is also possible for there to be little or no explicit (user-specified) finalization, but significant implicit finalization, performed by the compiler, interpreter, or runtime; this is common in case of automatic reference counting, as in the CPython reference implementation of Python, or in Automatic Reference Counting in Apple's implementation of Objective-C, which both automatically break references during finalization.
Memory deallocation during finalization is common in languages like C++ where manual memory management is standard, but also occurs in managed languages when memory has been allocated outside of the managed heap (externally to the language); in Java this occurs with Java Native Interface (JNI) and ByteBuffer objects in New I/O (NIO). This latter can cause problems due to the garbage collector not being able to track these external resources, so they will not be collected aggressively enough, and can cause out-of-memory errors due to exhausting unmanaged memory – this can be avoided by treating native memory as a resource and using the dispose pattern, as discussed below.
Object resurrection
If user-specified finalizers are allowed, it is possible for finalization to cause object resurrection, as the finalizers can run arbitrary code, which may create references from live objects to objects being destroyed. For languages without garbage collection, this is a severe bug, and causes dangling references and memory safety violations; for languages with garbage collection, this is prevented by the garbage collector, most commonly by adding another step to garbage collection (after running all user-specified finalizers, check for resurrection), which complicates and slows down garbage collection.
Further, object resurrection means that an object may not be destroyed, and in pathological cases an object can always resurrect itself during finalization, making itself indestructible. To prevent this, some languages, like Java and Python (from Python 3.4) only finalize objects once, and do not finalize resurrected objects. Concretely this is done by tracking if an object has been finalized on an object-by-object basis. Objective-C also tracks finalization (at least in recent Apple versions) for similar reasons, treating resurrection as a bug.
A different approach is used in the .NET Framework, notably C# and Visual Basic .NET, where finalization is tracked by a "queue", rather than by object. In this case, if a user-specified finalizer is provided, by default the object is only finalized once (it is queued for finalization on creation, and dequeued once it is finalized), but this can be changed via calling the GC module. Finalization can be prevented by calling GC.SuppressFinalize, which dequeues the object, or reactivated by calling GC.ReRegisterForFinalize, which enqueues the object. These are particularly used when using finalization for resource management as a supplement to the dispose pattern, or when implementing an object pool.
Resource management
In languages with deterministic object lifetimes, notably C++, resource management is frequently done by tying resource possession lifetime to object lifetime, acquiring resources during initialization and releasing them during finalization; this is known as Resource Acquisition Is Initialization (RAII). This ensures that resource possession is a class invariant, and that resources are released promptly when the object is destroyed.
However, in languages with non-deterministic object lifetimes – which include all major languages with garbage collection, such as C#, Java, and Python – this does not work, because finalization may not be timely or may not happen at all, and thus resources may not be released for a long time or even at all, causing resource leaks. In these languages resources are instead generally managed manually via the dispose pattern: resources may still be acquired during initialization, but are released by calling a dispose method. Nevertheless, using finalization for releasing resources in these languages is a common anti-pattern, and forgetting to call dispose will still cause a resource leak.
In some cases both techniques are combined, using an explicit dispose method, but also releasing any still-held resources during finalization as a backup. This is commonly found in C#, and is implemented by registering an object for finalization whenever a resource is acquired, and suppressing finalization whenever a resource is released.
Contrast with initialization
Finalization is formally complementary to initialization – initialization occurs at the start of lifetime, finalization at the end – but differs significantly in practice. Both variables and objects are initialized, primarily to assign values, but in general only objects are finalized, and in general there is no need to clear values – the memory can simply be deallocated and reclaimed by the operating system.
Beyond assigning initial values, initialization is primarily used to acquire resources or to register an object with some service (like an event handler). These actions have symmetric release or unregister actions, and these can symmetrically be handled in a finalizer, which is done in RAII. However, in many languages, notably those with garbage collection, object lifetime is asymmetric: object creation happens deterministically at some explicit point in the code, but object destruction happens non-deterministically, in some unspecified environment, at the discretion of the garbage collector. This asymmetry means that finalization cannot be effectively used as the complement of initialization, because it does not happen in a timely manner, in a specified order, or in a specified environment. The symmetry is partially restored by also disposing of the object at an explicit point, but in this case disposal and destruction do not happen at the same point, and an object may be in a "disposed but still alive" state, which weakens the class invariants and complicates use.
Variables are generally initialized at the start of their lifetime, but not finalized at the end of their lifetime – though if a variable has an object as its value, the object may be finalized. In some cases variables are also finalized: GCC extensions allow finalization of variables.
Connection with finally
As reflected in the naming, "finalization" and the finally construct both fulfill similar purposes: performing some final action, generally cleaning up, after something else has finished. They differ in when they occur – a finally clause is executed when program execution leaves the body of the associated try clause – this occurs during stack unwind, and there is thus a stack of pending finally clauses, in order – while finalization occurs when an object is destroyed, which happens depending on the memory management method, and in general there is simply a set of objects awaiting finalization – often on the heap – which need not happen in any specific order.
However, in some cases these coincide. In C++, object destruction is deterministic, and the behavior of a finally clause can be produced by having a local variable with an object as its value, whose scope is a block corresponds to the body of a try clause – the object is finalized (destructed) when execution exits this scope, exactly as if there were a finally clause. For this reason, C++ does not have a finally construct – the difference being that finalization is defined in the class definition as the destructor method, rather than at the call site in a finally clause.
Conversely, in the case of a finally clause in a coroutine, like in a Python generator, the coroutine may never terminate – only ever yielding – and thus in ordinary execution the finally clause is never executed. If one interprets instances of a coroutine as objects, then the finally clause can be considered a finalizer of the object, and thus can be executed when the instance is garbage collected. In Python terminology, the definition of a coroutine is a generator function, while an instance of it is a generator iterator, and thus a finally clause in a generator function becomes a finalizer in generator iterators instantiated from this function.
History
The notion of finalization as a separate step in object destruction dates to Mongomery (1994),[1] by analogy with the earlier distinction of initialization in object construction in Martin & Odell (1992).[2] Literature prior to this point used "destruction" for this process, not distinguishing finalization and deallocation, and programming languages dating to this period, like C++ and Perl, use the term "destruction". The terms "finalize" and "finalization" are also used in the influential book Design Patterns (1994).[lower-alpha 1][3] The introduction of Java in 1995 contained finalize methods, which popularized the term and associated it with garbage collection, and languages from this point generally make this distinction and use the term "finalization", particularly in the context of garbage collection.
Notes
- ↑ Published 1994, with a 1995 copyright.
References
- ↑ Montgomery 1994, p. 120, "As with object instantiation, design for object termination can benefit from implementation of two operations for each class—a finalize and a terminate operation. A finalize operation breaks associations with other objects, ensuring data structure integrity."
- ↑ Montgomery 1994, p. 119, "Consider implementing class instatiation as a create and initialize operation, as suggested by Martin and Odell. The first allocates storage for new objects, and the second constructs the object to adhere to specifications and constraints."
- ↑ "Every new class has a fixed implementation overhead (initialization, finalization, etc.).", "destructor In C++, an operation that is automatically invoked to finalize an object that is about to be deleted."
- Martin, James; Odell, James J. (1992). Object-oriented analysis and design. Prentice-Hall. ISBN 0-13-630245-9.
- Montgomery, Stephen (1994). Object-Oriented Information Engineering: Analysis, Design, and Implementation. Academic Press. ISBN 978-0-12-505040-1.