Issue 100831.1: DW_OP_implicit_pointer

Author: Roland McGrath
Champion: Jakub Jelinek
Date submitted: 2010-08-31
Date revised:
Date closed:
Type: Enhancement
Status: Accepted
DWARF Version: 5
Section 2.6.1.1.3, pg 27
Implicit pointer values in debug information
============================================

Overview
--------

Optimized compilation can eliminate a pointer that exists in the
semantics of the source program (including C++ "references" and other
source language constructs with pointer-like semantics).  These may be
pointer-typed variables, or portions of aggregate data structures that
are ordinarily represented as address values.  While no actual address
will ever exist for these semantic pointers at runtime, often what they
"point to" has a known location in registers or discontiguous pieces, or
can be rematerialized from other information at runtime.

For example, consider the C function:

    static void add_point (struct point *a, const struct point *b)
    {
      a->x += b->x;
      a->y += b->y;
    }

This may be inlined away in compilation such that all the members of the
two structures exist only in registers.  A debugger cannot show the user
an address value for "a" or "b", but it can show a struct value for "*a"
or "*b", and it can evaluate "a->x" or "b->y".  Another common example
is the "this" pointers and reference-typed arguments in C++ methods
handling class types with small amounts of data so they are inlined away
into using just registers.  More complex examples can include pointers
into the middle of an array or other aggregate type, optimized-away
aggregates containing pointer-typed members, pointers to constants
optimized away, multiple levels of pointer indirection optimized away,
and combinations of all these.


Proposed changes to DWARF
-------------------------

2.6.1.1.3 Implicit Location Descriptions

An implicit location description represents a piece or all of an
object which has no actual location but whose contents are
nonetheless either known or known to be undefined.

The following DWARF operations may be used to specify a value
that has no location in the program but is a known constant or is
computed from other locations and values in the program.

1. DW_OP_implicit_value

   The DW_OP_implicit_value operation specifies an immediate
   value using two operands: an unsigned LEB128 length, followed
   by a block representing the value in the memory representation
   of the target machine. The length operand gives the length in
   bytes of the block.

2. DW_OP_stack_value

   The DW_OP_stack_value operation specifies that the object does
   not exist in memory but its value is nonetheless known and is
   at the top of the DWARF expression stack. In this form of
   location description, the DWARF expression represents the
   actual value of the object, rather than its location. The
   DW_OP_stack_value operation terminates the expression.

3. DW_OP_implicit_pointer

   The DW_OP_implicit_pointer operation specifies that the object
   is a pointer that cannot be represented as a real pointer,
   even though the value it would point to can be described. In
   this form of location description, the DWARF expression refers
   to a debugging information entry that represents the actual
   value of the object to which the pointer would point. Thus, a
   consumer of the debug information would be able to show the
   value of the dereferenced pointer, even when it cannot show
   the value of the pointer itself.

   This operation has two operands: a reference to a debugging
   information entry that describes the dereferenced object's
   value, and a signed number that is treated as a byte offset
   from the start of that value. The first operand is a 4-byte
   unsigned value in the 32-bit DWARF format, or an 8-byte
   unsigned value in the 64-bit DWARF format (see Section 7.4).
   The second operand is a signed LEB128 number.

   The first operand is used as the offset of a debugging
   information entry in a .debug_info section, which may be
   contained in a shared object or executable other than that
   containing the operator. For references from one shared object
   or executable to another, the relocation must be performed by
   the consumer.

   The referenced debugging information entry is typically a
   DW_TAG_variable or DW_TAG_formal_parameter entry whose
   DW_AT_location attribute gives a second DWARF expression or a
   location list that describes the value of the object, but it
   may be any entry that contains a DW_AT_location or
   DW_AT_const_value attribute (e.g., DW_TAG_dwarf_procedure).
   Using the second DWARF expression, the consumer can
   reconstruct the value of the object when asked to dereference
   the pointer described by the original DWARF expression
   containing the DW_OP_implicit_pointer operation.

DWARF location expressions are intended to yield the *location*
of a value rather than the value itself. An optimizing compiler
may perform a number of code transformations where it becomes
impossible to give a location for a value, but remains possible
to describe the value itself. Section 2.6.1.1.2 ("Register
Location Descriptions") describes operators that can be used to
describe the location of a value when that value exists in a
register but not in memory. The operations in this section are
used to describe values that exist neither in memory nor in a
single register.

If the compiler determines that the value of an object is
constant (either throughout the program, or within a specific
range), it may choose to materialize that constant only when
used, rather than store it in memory or in a register. The
DW_OP_implicit_value operation can be used to describe such a
value. Sometimes, the value may not be constant, but still can be
easily rematerialized when needed. A DWARF expression terminating
in DW_OP_stack_value can be used for this case. The compiler may
also eliminate a pointer value where the target of the pointer
resides in memory, and the DW_OP_stack_value operator may be used
to rematerialize that pointer value. In other cases, the compiler
will eliminate a pointer to an object that itself needs to be
materialized. Since the location of such an object cannot be
represented as a memory address, a DWARF expression cannot give
either the location or the actual value of a pointer variable
that would refer to that object. The DW_OP_implicit_pointer
operation can be used to describe the pointer, and the debugging
information entry to which its first operand refers describes the
value of the dereferenced object. A DWARF consumer will not be
able to show the location or the value of the pointer variable,
but it will be able to show the value of the dereferenced
pointer.


D.13 Implicit Pointer Examples

The C source shown in figure D.36 can be described in DWARF as
illustrated in Figure D.37, assuming the foo function is not inlined,
assuming that the argument x is passed in register 5 and that the
foo function is optimized by the compiler into just increment of
a volatile variable v.

--------------------------------------------------------
struct S { short a; char b, c; };
volatile int v;
void foo (int x)
{
  struct S s = { x, x + 2, x + 3 };
  char *p = &s.b;
  s.a++;
  v++;
}
int main ()
{
  foo (v+1);
  return 0;
}

Figure D.36: Implicit pointer example #1: source
--------------------------------------------------------


--------------------------------------------------------
1$: DW_TAG_structure_type
        DW_AT_name("S")
        DW_AT_byte_size(4)
10$:    DW_TAG_member
        DW_AT_name("a")
        DW_AT_type(reference to "short int")
        DW_AT_data_member_location(constant 0)
11$:    DW_TAG_member
        DW_AT_name("b")
        DW_AT_type(reference to "char")
        DW_AT_data_member_location(constant 2)
12$:    DW_TAG_member
        DW_AT_name("c")
        DW_AT_type(reference to "char")
        DW_AT_data_member_location(constant 3)
2$: DW_TAG_subprogram
        DW_AT_name("foo")
20$:    DW_TAG_formal_parameter
        DW_AT_name("x")
        DW_AT_type(reference to "int")
        DW_AT_location(DW_OP_reg5)
21$:    DW_TAG_variable
        DW_AT_name("s")
        DW_AT_location(expression=
        DW_OP_breg5(1) DW_OP_stack_value DW_OP_piece(2)
        DW_OP_breg5(2) DW_OP_stack_value DW_OP_piece(1)
        DW_OP_breg5(3) DW_OP_stack_value DW_OP_piece(1))
22$:    DW_TAG_variable
        DW_AT_name("p")
        DW_AT_type(reference to "char *")
        DW_AT_location(expression=
        DW_OP_implicit_pointer(reference to 21$, 2))

Figure D.37: Implicit pointer example #1: DWARF description
--------------------------------------------------------


When both the s and p variables are optimized away completely,
this DWARF description still allows to print the value of the
s variable - { 2, 3, 4 } - and, as the s variable doesn't live in
memory, there is nothing to print for value of p, but the debugger
should still be able to show that p[0] is 3, p[1] is 4, p[-1] is
on little endian 0 and p[-2] is 2.

The C source shown in figure D.38 can be described in DWARF as
illustrated in Figure D.39, assuming the foo function is inlined
into main and the body of the main function is optimized to just
3 blocks of instructions where each one increments the volatile
variable v, and finally block of instructions to return 0 from
the function.  Assume that label0 is at the start of the main
function, label1 after the first v++ block, label2 after the
second v++ block and label3 at the end of the main function.
The b variable is optimized away completely, as it isn't used,
and the "opq" string literal is optimized away as well.


--------------------------------------------------------
static const char *b = "opq";
volatile int v;
static inline void foo (int *p)
{
  (*p)++;
  v++;
  p++;
  (*p)++;
  v++;
}
int main ()
{
  int a[2] = { 1, 2 };
  v++;
  foo (a);
  return a[0] + a[1] - 5;
}

Figure D.38: Implicit pointer example #2: source
--------------------------------------------------------


--------------------------------------------------------
1$: DW_TAG_variable
        DW_AT_name("b")
        DW_AT_type(reference to "const char *")
        DW_AT_location(expression=
        DW_OP_implicit_pointer(reference to 2$, 0))
2$: DW_TAG_dwarf_procedure
        DW_AT_location(expression=
        DW_OP_implicit_value(4, {'o', 'p', 'q', '\0'}))
3$: DW_TAG_subprogram
        DW_AT_name("foo")
        DW_AT_inline(DW_INL_declared_inlined)
30$:    DW_TAG_formal_parameter
        DW_AT_name("p")
        DW_AT_type(reference to "int *")
4$: DW_TAG_subprogram
        DW_AT_name("main")
40$:    DW_TAG_variable
        DW_AT_name("a")
        DW_AT_type(reference to "int[2]")
        DW_AT_location(location list 98$)
41$:    DW_TAG_inlined_subroutine
        DW_AT_abstract_origin(reference to 3$)
42$:    DW_TAG_formal_parameter
        DW_AT_abstract_origin(reference to 30$)
        DW_AT_location(location list 99$)

! .debug_loc section
98$:    <label0 in main> .. <label1 in main>
    DW_OP_lit1 DW_OP_stack_value DW_OP_piece(4)
    DW_OP_lit2 DW_OP_stack_value DW_OP_piece(4)
    <label1 in main> .. <label2 in main>
    DW_OP_lit2 DW_OP_stack_value DW_OP_piece(4)
    DW_OP_lit2 DW_OP_stack_value DW_OP_piece(4)
    <label2 in main> .. <label3 in main>
    DW_OP_lit2 DW_OP_stack_value DW_OP_piece(4)
    DW_OP_lit3 DW_OP_stack_value DW_OP_piece(4)
    0 .. 0
99$:    <label1 in main> .. <label2 in main>
    DW_OP_implicit_pointer(reference to 40$, 0)
    <label2 in main> .. <label3 in main>
    DW_OP_implicit_pointer(reference to 40$, 4)
    0 .. 0

Figure D.39: Implicit pointer example #2: DWARF description
--------------------------------------------------------


With the DWARF description in Figure D.39, at the beginning of
each of the 3 increments of volatile variable v the current
value of the variable a in main can be viewed (it will be
{ 1, 2 }, { 2, 2 } resp. { 2, 3 } at those locations).  The
parameter p value can't be printed, it points somewhere into
the a variable which doesn't live in memory, but at the beginning
of the second v increment p[0] can be shown to be 2 and p[1] also 2,
and at the beginning of the third v increment p[0] will be 3 and
p[-1] 2.  The value of b also can't be printed, the string literal
"opq" doesn't live in memory, but the debugger should be able to
print what b points to ("opq") and that b[0] is 'o', b[3] is '\0'
etc.

-- 

Revised 2/12/2013.  Previous version: 
http://dwarfstd.org/issues/100831.1-1.html
Revised 5/26/2013:  Added Appendix D examples.
Accepted 5/28/2013.