Issue 140724.1: Line File Table

Author: Ron Brender
Champion: Ron Brender
Date submitted: 2014-07-24
Date revised:
Date closed:
Type: Enhancement
Status: Accepted with modifications
DWARF Version: 5
This is a REPLACEMNT for Issue 140724.1 which was discussed at
the August 19 meeting. (That proposal itself was a follow on to
previously approved Issues 140323.1 and 140331.2 which were
reconsidered at the August meeting and rejected in favor of
considering 140724.1). This version reflects changes and consensus
from the August meeting.

The main changes in this version compared to the draft considered
before are:
 - A new form DW_FORM_line_str is introduced, which is designed
   for references into the .debug_line_str section.
 - Clarify that there is no .debug_line_str.dwo section to be used
   in a .dwo file; rather string references in the .debug_line
   section should use form DW_FORM_strp or DW_FORM_strx.
 - Clean up latent file stripping problems pointed out by Cary.
 - Add form DW_FORM_data16 for holding MD5 values (in particular).

This version has also benefited greatly from review by Cary.
 
Here is the revised proposal:

=====================================================================

(1) In 6.2.4, add two new fields following the version field and
renumber subsequent bullets (and renumber subsequent fields):

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3. address_size (ubyte)
    A 1-byte unsigned integer containing the size in bytes of an
    address (or offset portion of an address for segmented addressing)
    on the target system.
   
    *The address_size field is new in DWARF Version 5. It is needed
    to legitimize the common practice of stripping all but the line
    number sections (.debug_line and .debug_line_str) from an executable.*

 4. segment_size (ubyte)
    A 1-byte unsigned integer containing the size in bytes of a segment
    selector on the target system.
   
    *The segment_size field is new in DWARF Version 5. It is needed
    in combination with the address_size field (preceding) to accurately
    characterize the address representation on the target system.*  
---------------------------------------------------------------------

=====================================================================

(2) In 6.2.4, replace all text starting at bullet number 11
through the end of the section with the following (note that the
previous bullet 11 becomes new bullet 13 because of the two fields
added above):

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*The remaining fields provide information about the
source files used in the compilation. These fields
have been revised in DWARF Version 5 to support these
goals:
  . To allow new alternative means for a consumer to
    check that a file it can access is the same version
    as that used in the compilation.
  . To allow a producer to collect file name strings in
    a new section (.debug_line_str) that can be used
    to merge duplicate file name strings.
  . To add the ability for producers to provide vendor-
    defined information that can be skipped by a consumer
    that is unprepared to process it.*

13. directory_entry_format_count (ubyte)
    A count of the number of entries that occur in the
    following directory_entry_format field.
   
14. directory_entry_format (sequence of uleb pairs)
    A sequence of directory entry format descriptions.
    Each description consists of a pair of uleb values:
      . A content type code (see below)
      . A form code using the attribute form codes
   
15. directories_count (uleb)
    A count of the number of entries that occur in the
    following directories field.
   
16. directories (sequence of directory names)
    A sequence of directory names and optional related
    information. Each entry is encoded as described
    by the directory_entry_format field.
   
    Entries in this sequence describe each path that was
    searched for included source files in this compilation,
    including the compilation directory of the compilation.
    (The paths include those directories specified by the
    user for the compiler to search and those the compiler
    searches without explicit direction.)
   
    The first entry is the current directory of the compilation.
    Each additional path entry is either a full path name or
    is relative to the current directory of the compilation.
   
    The line number program assigns a number (index) to each
    of the directory entries in order, beginning with 0.
   
    *Prior to DWARF Version 5, the current directory was not
    represented in the directories field and a directory index
    of 0 implicitly referred to that directory as found in the
    DW_AT_comp_dir attribute of the compilation unit DIE. In
    DWARF Version 5, the current directory is explicitly present
    in the directories field. This is needed to legitimize the
    common practice of stripping all but the line number sections
    (.debug_line and .debug_line_str) from an executable.*
   
    *Note that if a .debug_line_str section is present, both
    the compilation unit DIE and the line number header can
    share a single copy of the current directory name string.*
   
17. file_name_entry_format_count (ubyte)
    A count of the number of file entry format entries that
    occur in the following file_name_entry_format field. If this
    field is zero, then the file_names_count field (see below)
    must also be zero.

18. file_name_entry_format (sequence of uleb pairs)
    A sequence of file entry format descriptions.
    Each description consists of a pair of uleb values:
      . A content type code (see below)
      . A form code using the attribute form codes

19. file_names_count (uleb)
    A count of the number of file name entries that occur
    in the following file_names field.
   
20. file_names (sequence of file name entries)
    A sequence of file names and optional related
    information. Each entry is encoded as described
    by the file_name_entry_format field (in the
    order described).
   
    Entries in this sequence describe source files that
    contribute to the line number information for this
    compilation or is used in other contexts, such as in
    a declaration coordinate or a macro file inclusion.
 
    The primary source file is described by an entry whose
    path name exactly matches that given in the DW_AT_name
    attribute in the compilation unit.
   
    The line number program assigns numbers to each of
    the file name entries in order, beginning with 1, and uses
    those numbers instead of file names in the line number
    program that follows.

*A producer may generate an empty file_names field using
a file_names_count value of 0 and subsequently define file
names using the extended opcodes DW_LNE_define_file or
DW_LNE_define_file_MD5 (although these may not be intermixed
in the name line number section). In this case, the
file_name_entry_format_count should also be zero.*

6.2.4.1 Standard Content Descriptions
DWARF-defined content type codes are used to indicate
the type of information that is represented in one
component of an include directory or file name description.
The following type codes are defined.

1.  DW_LNCT_path
    The component is a null-terminated path name string.
    If the associated form code is DW_FORM_string, then the
    string occurs immediately in the containing directories
    or file_names field. If the form code is DW_FORM_line_strp,
    then the string is included in the .debug_line_str section
    and its offset occurs immediately in the containing
    directories or file_names field.

    *Note that this use of DW_FORM_line_strp is similar to
    DW_FORM_strp but refers to the .debug_line_str section,
    not .debug_str.*
   
    In a .debug_line.dwo section, the form DW_FORM_strx may
    also be used. This refers into the .debug_str_offsets.dwo
    section (and indirectly also the .debug_str.dwo section)
    because no .debug_line_str_offsets.dwo or .debug_line_str.dwo
    sections exist or are defined for use in split objects. (The
    form DW_FORM_string may also be used, but this precludes the
    benefits of string sharing.)
   
    In the 32-bit DWARF format, the representation of a
    DW_FORM_line_strp value is a 4-byte unsigned offset; in the
    64-bit DWARF format, it is an 8-byte unsigned offset (see
    Section 7.4 on page 163).
   
2.  DW_LNCT_directory_index
    The unsigned directory index represents an entry in the
    directories field of the header. The index is 0 if
    the file was found in the current directory of the compilation
    (hence, the first directory in the directories field),
    1 if it was found in the second directory in the directories
    field, and so on.
   
    This content code is always paired with one of DW_FORM_data1,
    DW_FORM_data2 or DW_FORM_udata.
   
    *The optimal form for a producer to use (which results in the
    minimum size for the set of include_index fields) depends not only
    on the number of directories in the directories
    field, but potentially on the order in which those directories are
    listed and the number of times each is used in the file_names field.
    However, DW_FORM_udata is expected to be near optimal in most common
    cases.*
   
 3. DW_LNCT_timestamp
    DW_LNCT_timestamp indicates that the value is the implementation-
    defined time of last modification of the file, or 0 if not
    available. It is always paired with one of the forms
    DW_FORM_udata, DW_FORM_data4, DW_FORM_data8 or DW_FORM_block.
   
4.  DW_LNCT_size
    DW_LNCT_size indicates that the value is the unsigned size of the
    file in bytes, or 0 if not available. It is paired with one of the
    forms DW_FORM_udata, DW_FORM_data1, DW_FORM_data2, DW_FORM_data4,
    or DW_FORM_data8.
   
5.  DW_LNCT_MD5
    DW_LNCT_MD5 indicates that the value is a 16-byte MD5 digest
    of the file contents. It is paired with form DW_FORM_data16.

*Using this representation, the information found in a DWARF
Version 4 line number header could be encoded as follows:*

    Field  Field Name                       Value(s)
    Number
    1      *Same as in Version 4*      ...
    2      version                          5
    3      *Not present in Version 4*  -
    4      *Not present in Version 4*  -
    5-12   *Same as in Version 4*      ...
    13     directory_entry_format_count     1
    14     directory_entry_format           DW_LNCT_path, DW_FORM_string
    15     directories_count                <n+1>
    16     directories                      <n+1>*<null terminated string>
    17     file_name_entry_format_count     4
    18     file_name_entry_format           DW_LNCT_path, DW_FORM_string,
                                            DW_LNCT_directory_index, DW_FORM_udata,
                                            DW_LNCT_timestamp, DW_FORM_udata,
                                            DW_LNCT_size, DW_FORM_udata
    19     file_names_count                 <m>
    20     file_names                       <m>*{<null terminated string>,
                                            <index>, <timestamp>, <size>}

6.2.4.2 Vendor-defined Content Descriptions
Vendor-defined content descriptions may be defined using content
type codes in the range DW_LNCT_lo_user to DW_LNCT_hi_user. Each
such code may be combined with one or more forms from the set:
DW_FORM_block, DW_FORM_block1, DW_FORM_block2, DW_FORM_block4,
DW_FORM_data1, DW_FORM_data2, DW_FORM_data4, DW_FORM_data8,
DW_FORM_flag, DW_FORM_line_strp, DW_FORM_sdata, DW_FORM_sec_offset,
DW_FORM_string, DW_FORM_strp, DW_FORM_strx, and DW_FORM_udata.

*If a consumer encounters a vendor-defined content type that
it does not understand, it should skip the content data as though
it was not present.*

=====================================================================

(3) In Section 6.3.1, bullet 4., second paragraph: add DW_FORM_data16
to the list of forms.

=====================================================================

(4) In Section 7.4, bullet 3., add the following row in the table:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        DW_FORM_line_strp        .debug_line_strp
---------------------------------------------------------------------

=====================================================================

(5) In Section 7.4, insert a new bullet number 4. and renumber the
current 4. and higher appropriately.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
4. Within the body of the .debug_line section, certain forms of content
description depend on the choice of DWARF format as follows. For the
32-bit DWARF format, the value is a 32-bit unsigned integer; for the
64-bit DWARF format, the value is a 64-bit unsigned integer.

        Form                  Role
        -----------------------------------------------
        DW_FORM_line_strp     offset in .debug_line_str
        -----------------------------------------------

=====================================================================

(6) In Section 7.5.4, bullet constant, replace the first two sentences
with:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    There are seven forms of constants. There are fixed length constant
    data forms for one, two, four, eight and sixteen byte values
    (respectively, DW_FORM_data1, DW_FORM_data2, DW_FORM_data4,
    DW_FORM_data8 and DW_FORM_data16).
---------------------------------------------------------------------

In the following paragraph, add DW_FORM_data16 to th list of fixed length
forms.

=====================================================================

(7) In Section 7.5.4, in the bullet for string, in the second unnumbered
bullet, replace the first sentence (which begins "as an offset") with:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  - as an offset into a string table contained in either the .debug_str
    section (DW_FORM_strp) or the .debug_line_str section (DW_FORM_line_strp)
    of the object file.
---------------------------------------------------------------------

=====================================================================

(8) Replace Table 7.25 and its introductory sentence with
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    The format encodings used for the directory and file name entries
    in a line number table header are given in Table 7.25
   
        Table 7.25: Line number content type encodings
    
        File entry content type    Value
        ---------------------------------
        DW_LNCT_path +             0x1
        DW_LNCT_include_index +    0x2
        DW_LNCT_timestamp +        0x3
        DW_LNCT_size +             0x4
        DW_LNCT_MD5 +              0x5
        DW_LNCT_lo_user +          0x2000
        DW_LNCT_hi_user +          0x3fff
        ---------------------------------
        + New in DWARF Version 5
    
=====================================================================

(9) In Appendix G, add this line in the table:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    .debug_line_str    -   -   -    *
---------------------------------------------------------------------

=====================================================================

(10) In Appendix B, add the following in Figure B.1 emanating from
((.debug_line)):

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    -> [[ DW_FORM_line_strp (r) ]] -> (( .debug_line_str ))
---------------------------------------------------------------------
    where -> indicates an arrow, [[]] indicates a square box and (())
    indicates an oval box,
   
    and in the Notes add:
   
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    (r) Path contents in a line number table header can be represented
        by an offset to the beginning of a string in the .debug_line_str
        section using DW_FORM_line_strp.
       
        DW_FORM_line_str may also be used for attributes in the .debug_info
        section, notably DW_AT_comp_dir (not shown).
---------------------------------------------------------------------

Drawing an arrow from the .debug_line oval to the .debug_line_str oval
is not practical. Should I try to invent some kind of connector bubble,
or just let the the above note suffice?

=====================================================================

(11) Add .debug_line_str to the lists in Sections 1.4, 6.2, 7.3.2 and 7.29.

=====================================================================

(12) In Table 7.6, add
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    DW_FORM_data16 +        0x1c
---------------------------------------------------------------------

=====================================================================

(13) In Appendix Section F.1, at the end of the bullet for
 .debug_line.dwo,  append the following:
 
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
         In a .dwo file there is no benefit to having a separate string
         section for directories and file names because the primary
         string table will never be stripped. Accordingly, no
         .debug_line_str.dwo is defined. Content descriptions corresponding
         to DW_FORM_line_str in an executable file (for example, in the
         skeleton compilation unit) instead use DW_FORM_strx. This allows
         directory and file name strings to be merged with general
         strings and across compilations in package files (which are not
         subject to potential stripping).

--
Revised 9/9/14.  Previous version: http://dwarfstd.org/issues/140724.1-1.html
Accepted with modifications 9/26/14:
  File zero is primary source file.
  Define_file and define_file_md5 deprecated, codes reserved.