HXUNENT
Section: HTML-XML-utils (1)
Updated: 10 Jul 2011
Index
Return to Main Contents
NAME
hxunent - replace HTML predefined character entities by UTF-8
SYNOPSIS
hxunent
[
-b ]
[
-f ]
[
file ]
DESCRIPTION
The
hxunent
command reads the
file
(or standard input) and copies it to standard output with &-entities
by their equivalent character (encoded as UTF-8). E.g., " is
replaced by " and < is replaced by <.
OPTIONS
The following options are supported:
- -b
-
The five builtin entities of XML (< > " ' &) are not
replaced but copied unchanged. This is necessary if the output has to
be valid XML or SGML.
- -f
-
This option changes how unknown entities or lone ampersands are handled. Normally they are copied unchanged, but this option tries to "fix" them by replacing ampersands by &. Often such stray ampersands are the result of copy and paste of URLs into a document and then this option indeed fixes them and makes the document valid.
DIAGNOSTICS
The program's exit value is 0 if all went well, otherwise:
- 1
-
The input couldn't be read (file not found, file not readable...)
- 2
-
Wrong command line arguments.
SEE ALSO
asc2xml(1),
xml2asc(1),
UTF-8 (RFC 2279)
BUGS
The program assumes entities are as defined by HTML. It doesn't read a
document's DTD to find the actual definitions in use in a document.
With
-f,
it will even remove all entities that are not HTML entities.
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- DIAGNOSTICS
-
- SEE ALSO
-
- BUGS
-