HXEXTRACT
Section: HTML-XML-utils (1)
Updated: 10 Jul 2011
Index
Return to Main Contents
NAME
hxextract - extract selected elements from a HTML or XML file
SYNOPSIS
hxextract
[
-h
|
-? ]
[
-x ]
[
-s
text ]
[
-e
text ]
[
-b
base ]
element-or-class
[
-c
configfile |
file-or-URL ]
DESCRIPTION
hxextract
outputs all elements with a certain name and/or class.
Input must be well-formed, since no HTML heuristics are applied.
OPTIONS
The following options are supported:
- -x
-
Use XML format conventions.
- -s text
-
Insert
text
at the start of the output.
- -e text
-
Insert
text
at the end of the output.
- -b base
-
URL base
- -c configfile
-
Read @chapter lines from
configfile
(lines must be of the form "@chapter filename") and extract elements from each of those files.
- -h, -?
-
Print command usage.
OPERANDS
The following operands are supported:
- element-or-class
-
The name of an element to extract (e.g., "H2"), or the name of a class
preceded by "." (e.g., ".example") or a combination of both (e.g.,
"H2.example").
- file-or-URL
-
A file name or a URL. To read from standard input, use "-".
ENVIRONMENT
To use a proxy to retrieve remote files, set the environment variables
http_proxy
and
ftp_proxy.
E.g.,
http_proxy=http://localhost:8080/
BUGS
Remote files (specified with a URL) are currently only supported for
HTTP. Password-protected files or files that depend on HTTP "cookies"
are not handled. (You can use tools such as
curl(1)
or
wget(1)
to retrieve such files.)
SEE ALSO
hxselect(1)
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- OPERANDS
-
- ENVIRONMENT
-
- BUGS
-
- SEE ALSO
-