HXEXTRACT

Section: HTML-XML-utils (1)
Updated: 10 Jul 2011
Index Return to Main Contents
 

NAME

hxextract - extract selected elements from a HTML or XML file  

SYNOPSIS

hxextract [ -h | -? ] [ -x ] [ -s text ] [ -e text ] [ -b base ] element-or-class [ -c configfile | file-or-URL ]  

DESCRIPTION

hxextract outputs all elements with a certain name and/or class.

Input must be well-formed, since no HTML heuristics are applied.  

OPTIONS

The following options are supported:
-x
Use XML format conventions.
-s text
Insert text at the start of the output.
-e text
Insert text at the end of the output.
-b base
URL base
-c configfile
Read @chapter lines from configfile (lines must be of the form "@chapter filename") and extract elements from each of those files.
-h, -?
Print command usage.
 

OPERANDS

The following operands are supported:
element-or-class
The name of an element to extract (e.g., "H2"), or the name of a class preceded by "." (e.g., ".example") or a combination of both (e.g., "H2.example").
file-or-URL
A file name or a URL. To read from standard input, use "-".
 

ENVIRONMENT

To use a proxy to retrieve remote files, set the environment variables http_proxy and ftp_proxy. E.g., http_proxy=http://localhost:8080/  

BUGS

Remote files (specified with a URL) are currently only supported for HTTP. Password-protected files or files that depend on HTTP "cookies" are not handled. (You can use tools such as curl(1) or wget(1) to retrieve such files.)  

SEE ALSO

hxselect(1)


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
OPERANDS
ENVIRONMENT
BUGS
SEE ALSO