Pete’s Guide to Technology and Everything Else Pete’s Guide to Technology and Everything Else

Pete’s Electronic Publishing Expert Set

It is not hard to notice that the typesetting in almost any electronic document (including Web pages, eBooks, and PDF literature) is of noticeably lower quality than the high standards the print publication worlds (especially book publishing) are known for, and rightly proud of.

There are many reasons for this, but one of them is that the set of characters defined in the official HTML standards is missing a significant number of characters that are necessary for the proper creation of documents. A related problem is that most Web browsers (and currently all eBook readers) do not even support the full set of characters specified by HTML. And most computer fonts are lacking these characters as well.

The only way to solve these problems is to specify an additional set of characters that Web browsers, eBook readers, and any similar program, device, or computer system must support in its entireity. This may seem like overkill, given that many of the characters are used only rarely, but given the nature of electronic distribution and rendering systems, it is absolutely necessary. Someone typesetting a hardcover book has the option of mixing the necessary characters from many different font files, since the end result is simply ink on paper, and thus there is no need for the book to contain all of the characters that might have been used. But electronic reading systems are different. In order for these rarely-used characters to be available for use in a document, the reading system must contain all of them before it receives the document, else they will appear incorrectly (as question marks, empty squares, completely wrong characters, or other strange constructs), and there is no way for the author to know in advance if a particular character will be supported on all his desired target platforms without testing each and every one.

Therefore, I present the list of characters detailed below, and submit that the current document standards, including HTML, XHTML, XML, Open eBook, and Adobe Acrobat, be ammended to include all characters in the list, and for any reading system using these standards contain at least one font that has all these characters, so that any character can be displayed, even if one is not available in the preferred font for any particular document.

Expert set entity definition file
Pete’s Expert Set proposal—characters that are essential to producing good-looking documents. The list is a work-in-progress, in that I am still open to debate as to whether characters need to be added or removed from this list, and the entity names are still unfinished.
Latin-1 entity definition file
One of three standard sets of characters defined in XHTML 1.0 (and HTML 4.0)
Symbol entity definition file
One of three standard sets of characters defined in XHTML 1.0 (and HTML 4.0)
Special entity definition file
One of three standard sets of characters defined in XHTML 1.0 (and HTML 4.0)


Valid XHTML 1.1! Valid CSS!