Issue 180426.2: Line table "comment" opcode

Author: Paul T Robinson
Champion: Tom Russell
Date submitted: 2018-04-26
Date revised: 2021-05-20
Date closed: 2021-06-14
Type: Enhancement
Status: Accepted with modifications
DWARF Version: 6
Section 6.2.5.3, pg 165

There are times when it is useful to be able to introduce
padding or a no-op into a line-number program.  One example
is a producer that wants to leave room for expansion of a
function, in an incremental-linking scenario.  Another is
when a linker removes an unused function from a compilation
unit, and wants to "erase" the corresponding portion of the
line-number program without rewriting the entire .debug_line
section.

There are syntactically legal ways to do this in DWARF,
for example by using a series of DW_LNS_set_basic_block
opcodes.  However, a consumer would still have to parse
and execute these opcodes, which is less efficient than
we might like.  Or, if the area to be filled is preceded
by a DW_LNE_end_sequence opcode, the ULEB length of that 
opcode could be artificially padded with 0x80 bytes, in 
the hope that a consumer would be willing to parse an 
arbitrarily long ULEB number.  This is placing an unusual
expectation on consumers, however, who might reasonably
place an upper bound on the number of ULEB bytes they
are willing to parse without complaint.

To solve this, I propose a new DW_LNE_padding opcode.
It takes one operand, which can be zero length, and has
no effect on the line number program.  This opcode allows
efficiently skipping 3 or more bytes of the line number
program.  (Any extended opcode occupies a minimum of 3
bytes.)

In a related thread on dwarf-discuss, Cary pointed out
that given how DWARF works, any undefined extended opcode
could be used for this purpose.  The risk is that someday
we might actually define that opcode!  By reserving an
opcode for this purpose, we eliminate the risk; and,
existing producers can use the new opcode regardless of
the DWARF version, because consumers already know how to
skip an unrecognized extended opcode.


Ron has pointed out that certain padding lengths might
not be achievable with a single opcode due to how LEB128
is encoded.  I've decided not to add a non-normative text
for the following reasons:
it is not a concern for consumers, producers will already
be aware of (or quickly realize) the limitation, and the
solution (use multiple padding opcodes) is obvious.

Proposed text:

6.2.5.3 Extended Opcodes

Add a new numbered entry:

  DW_LNE_padding
  The DW_LNE_padding opcode is followed by a single
  operand which consists of a sequence of arbitrary bytes,
  which may be zero length.  The size of the operand is
  implicitly given by the unsigned LEB128 integer that
  precedes the opcode.  The opcode and operand have no
  effect on the line number program.

  *This permits a producer to pad or overwrite arbitrary
  parts of a line number program, with a minimum of the
  three bytes needed to encode any extended opcode.*

7.22 Line Number Information

Add DW_LNE_padding to table 7.26 with the next available
opcode number.

--
2021-05-20:  Revised:  Name changed from 'DW_LNE_comment' to 'DW_LNE_padding'.
  Refer to operand as 'sequence of arbitrary bytes' rather than 'string'.
2021-06-14:  Accepted with modification. 
  Replace definition with 
     The DW_LNE_padding opcode is followed by a sequence of
     zero or more arbitrary bytes up to the length specified
     by the unsigned LEB128 integer that precedes all extended
     opcodes. The opcode has no effect on the line number program.