Issue 161027.1: .debug_names vs. DW_ID_case_insensitive

Author: Jan Kratochvil
Champion: Cary Coutant
Date submitted: 2016-10-27
Date revised:
Date closed:
Type: Error
Status: Accepted with modifications
DWARF Version: 5
Section 7.33, pg 248
I do not see how to index DW_AT_identifier_case=DW_ID_case_insensitivei
language names so that a consumer can find such case-insensitive &&
case-preserving names in the .debug_names index hash.

.gdb_index just always lowercases everything for the hashing.


PROPOSED CHANGES

Add the following in Section 6.1.1.4.5 ("Hash Lookup Table"), just
after line 5 on Page 145 of the 20161126 prelease draft:

-------------
For the purposes of the hash computation, each symbol name should be
folded according to the simple case folding algorithm defined in the
"Caseless Matching" subsection of Section 5.18 ("Case Mappings") of
the Unicode Standard, Version 9.0.0. The original symbol name, as it
appears in the source code, should be stored in the name table.
*Thus, two symbols that differ only by case will hash to
the same slot, but the consumer will be able to distinguish the names
when appropriate.*

--

Accepted with modifications -- 1/24/2017

The following text will be added:

The simple case folding algorithm is further described
in the CaseFolding.txt file distributed with the Unicode Character
Database. That file defines four classes of mappings: Common (C),
Simple (S), Full (F), and Turkish (T). The hash computation uses the C
+ S mappings only, which do not affect the total length of the
string, with the addition that Turkish dotted 'I' and undotted 'I', both
upper and lower case, are translated to the Latin lower case 'i'.