DWARF Standard


HOME
SPECIFICATIONS
FAQ
ISSUES



200609.1 Paul T Robinson Reserve an address value for "not present" Enhancement Open


Section Several, pg 51,148,164
Motivation:

The DWARF standard generally assumes that described entities exist in
the final executable; however, various linker operations may result in
cases where this is not true, leaving addresses pointing to unfriendly
places.  In most cases linkers will fix up undefined symbols to 0x0,
however this is a valid address on many targets.  We should have a way 
to indicate that an address is invalid, i.e., a "tombstone" value that 
is not interpreted as a real address.

Discussion:

Section 2.17 "Code Addresses, Ranges and Base Addresses" p.51 line 29
says:
  If an entity has no associated machine code, none of these attributes
  are specified.
("These attributes" are DW_AT_low_pc, DW_AT_high_pc, or DW_AT_ranges.)

That's fine for a compiler emitting an entry for something that has no
associated code (e.g., declaration or abstract instance).  But there
are cases where the compiler emits code, and the linker strips it. I'm
aware of three situations where this can occur:
- functions emitted in COMDAT sections, typically C++ template
  instantiations or inline functions from a header file;
- deduplicating different functions with identical content; GNU refers
  to this as ICF (Identical Code Folding);
- functions with no callers; sometimes called dead-stripping or
  garbage collection.

In the first two cases, multiple copies of the "same" function will
be removed, and only one copy retained in the final executable; in the
third case, no copies are retained.

In these cases, the compiler will of course emit attributes pointing
to each function's code, and it's the linker who strips the code.  It
seems unrealistic to ask the linker to rewrite the DWARF to remove the
attributes; instead, the linker will simply "do something" with the
relocations associated with the address/range attributes.

One can argue that all DIE references to deduplicated functions can be 
fixed up to the one retained copy; however, that leaves entries in 
multiple CUs pointing to the same code, including the DW_AT_ranges for 
the CUs themselves.  It does not seem like a good idea to have multiple 
CUs claiming ownership of the same range of instructions.  Similarly the 
.debug_aranges section could be left in an unfortunate state; it is not 
supposed to have overlapping or duplicate ranges.

And of course in the dead-stripping case, there is no retained copy to
point to.

The solution that Sony has adopted in its proprietary linker, and the
solution we have proposed for the LLVM project's 'LLD' linker, is to
fix up references to removed functions to "-1", or as prior DWARF
versions described it in the base-address-selection entry of a range,
"The value of the largest representable address offset (for example,
0xffffffff when the size of an address is 32 bits)."

DWARF v5 no longer uses this special value for ranges, but it seems
very useful to have a standard "tombstone" value for the above cases.

Proposal:

Section 2.17 "Code Addresses, Ranges and Base Addresses", p. 51, line 29
Replace the paragraph:

  If an entity has no associated machine code, none of these attributes
  are specified.

with the following:

  If a producer emits no machine code for an entity, none of these
  attributes are specified.  If a producer emits machine code, and a
  later processing step removes that machine code, these attributes
  should be updated to indicate the entity does not exist (see
  Section 7.2.3 on page xxx); consumers should ignore any ranges for
  non-existent entities.

Section 6.1.2 "Lookup by Address" p.148, lines 10-18
Append the following normative text:

  If the address value indicates there is no associated machine code
  (see Section 7.2.3 on page xxx), the descriptor should be ignored.

Section 6.2.5.3, p.164, lines 28-34 describe DW_LNE_set_address.
Append the following to the normative text (lines 29-32):

  If the address value indicates there is no associated machine code
  (see Section 7.2.3 on page xxx), no instructions are associated with
  subsequent rows, and no other line number program opcodes should
  affect the "address" register, until the next DW_LNE_set_address or
  DW_LNE_end_sequence opcode.

New section 7.2.3 "Address Values"

  Address values normally provide the memory address of an entity
  (for example, the value of attribute DW_AT_low_pc, and values in
  the .debug_aranges, .debug_line, .debug_loclists and .debug_rnglists
  sections).  In some cases a producer may emit machine code for an
  entity, but a linker or other later processing step might remove
  that entity.  In that case, rather than be required to rewrite the
  relevant DWARF information, the processing step may update the
  address value to the largest representable address (for example,
  0xffffffff when the size of an address is 32 bits).  Consumers 
  should understand that this indicates a non-existent entity and
  ignore any address or range that uses this reserved value.

  <nonnormative>
  Consumers should not attempt to reference that address or any
  address derived from it (for example, by address-modifying opcodes
  in the line number program, or when used as the base address in a
  range).
  </nonnormative>




All logos and trademarks in this site are property of their respective owner.
The comments are property of their posters, all the rest © 2007-2020 by DWARF Standards Committee.