Tidy – 一个把HTML 转成XHTML的工具库[整理]

Tidy 最初由Dave
Raggett
统筹,并因而W3C
网站以开放源代码许可协议分发。现在Tidy是由SourceForge达到的同样丛志愿者以保安。
Tidy可以为此来分析、格式化HTML,是一个美妙之HTML解析引擎,它最初计划的目的是由此来机关修正HTML中之失实与松弛的价签。

Tidy项目得以拜(
http://tidy.sourceforge.net/
)获得,近期风行的相同潮改进是二〇〇八年九月份。

脚要有些有关Tidy的一些材料:

1.
Tidy首的设计者Raggett关于它的详细介绍;
( http://www.w3.org/People/Raggett/tidy/ )

  1. 技巧: 用 HTML Tidy 将
    HTML 转换成XML ;
    (  http://www.ibm.com/developerworks/cn/xml/x-tiptidy/\#resources )

  2. JTidy 是 HTML
    Tidy用Java实现之移植版本,提供了一个HTML的语法检查器和死好之打印效能,上等同坏革新的时间是2001年1月;
    ( http://jtidy.sourceforge.net/ )

4.
Ntidy是在Tidy基础上之.Net封装,上同样浅革新的日子是2004年11月。
( http://sourceforge.net/projects/ntidy/ )

另外副产品资料:

1.
NekoHTML是一个简易地HTML扫描器和标签补偿器(tag balancer)
,使得程序可以解析HTML文档并用标准的XML接口来聘中的信息。上同一不佳革新的辰是二〇〇八年1九月29日!(注:NekoHTML是java开源项目)

( http://sourceforge.net/projects/nekohtml )

2.
Html2xhtmlCleaner可知将HTML转换成为合法的XHTML文件,它还提供标签及特性过滤。它是codeproject上的.Net开源项目。

( http://www.codeproject.com/KB/cs/html2xhtmlcleaner.aspx
)

如上转载:http://www.cnblogs.com/drizzlecrj/archive/2009/03/05/1403606.html

补偿 配置文件:
http://tidy.sourceforge.net/docs/quickref.html

Quick Reference

HTML Tidy Configuration Options

Generated automatically with HTML Tidy released on 18 June 2008.

HTML, XHTML, XML
Diagnostics
Pretty Print
Character Encoding
Miscellaneous

HTML, XHTML, XML Options

Top

Option

Type

Default

add-xml-decl

Boolean

no

add-xml-space

Boolean

no

alt-text

String


anchor-as-name

Boolean

yes

assume-xml-procins

Boolean

no

bare

Boolean

no

clean

Boolean

no

css-prefix

String


decorate-inferred-ul

Boolean

no

doctype

DocType

auto

drop-empty-paras

Boolean

yes

drop-font-tags

Boolean

no

drop-proprietary-attributes

Boolean

no

enclose-block-text

Boolean

no

enclose-text

Boolean

no

escape-cdata

Boolean

no

fix-backslash

Boolean

yes

fix-bad-comments

Boolean

yes

fix-uri

Boolean

yes

hide-comments

Boolean

no

hide-endtags

Boolean

no

indent-cdata

Boolean

no

input-xml

Boolean

no

join-classes

Boolean

no

join-styles

Boolean

yes

literal-attributes

Boolean

no

logical-emphasis

Boolean

no

lower-literals

Boolean

yes

merge-divs

AutoBool

auto

merge-spans

AutoBool

auto

ncr

Boolean

yes

new-blocklevel-tags

Tag names


new-empty-tags

Tag names


new-inline-tags

Tag names


new-pre-tags

Tag names


numeric-entities

Boolean

no

output-html

Boolean

no

output-xhtml

Boolean

no

output-xml

Boolean

no

preserve-entities

Boolean

no

quote-ampersand

Boolean

yes

quote-marks

Boolean

no

quote-nbsp

Boolean

yes

repeated-attributes

enum

keep-last

replace-color

Boolean

no

show-body-only

AutoBool

no

uppercase-attributes

Boolean

no

uppercase-tags

Boolean

no

word-2000

Boolean

no

 

Diagnostics Options

Top

Option

Type

Default

accessibility-check

enum

0 (Tidy Classic)

show-errors

Integer

6

show-warnings

Boolean

yes

 

Pretty Print Options

Top

Option

Type

Default

break-before-br

Boolean

no

indent

AutoBool

no

indent-attributes

Boolean

no

indent-spaces

Integer

2

markup

Boolean

yes

punctuation-wrap

Boolean

no

sort-attributes

enum

none

split

Boolean

no

tab-size

Integer

8

vertical-space

Boolean

no

wrap

Integer

68

wrap-asp

Boolean

yes

wrap-attributes

Boolean

no

wrap-jste

Boolean

yes

wrap-php

Boolean

yes

wrap-script-literals

Boolean

no

wrap-sections

Boolean

yes

 

Character Encoding Options

Top

Option

Type

Default

ascii-chars

Boolean

no

char-encoding

Encoding

ascii

input-encoding

Encoding

latin1

language

String


newline

enum

Platform dependent

output-bom

AutoBool

auto

output-encoding

Encoding

ascii

 

Miscellaneous Options

Top

Option

Type

Default

error-file

String


force-output

Boolean

no

gnu-emacs

Boolean

no

gnu-emacs-file

String


keep-time

Boolean

no

output-file

String


quiet

Boolean

no

slide-style

String


tidy-mark

Boolean

yes

write-back

Boolean

no

 

 

HTML, XHTML, XML Options Reference

 

add-xml-decl

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

char-encoding
output-encoding

This option specifies if Tidy should add the XML declaration when
outputting XML or XHTML. Note that if the input already includes an
<?xml … ?> declaration then this option will be ignored. If the
encoding for the output is different from “ascii”, one of the utf
encodings or “raw”, the declaration is always added as required by the
XML standard.

 

add-xml-space

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should add xml:space=”preserve” to
elements such as <PRE>, <STYLE> and <SCRIPT> when
generating XML. This is needed if the whitespace in such elements is to
be parsed appropriately without having access to the DTD.

 

alt-text

Top

Type: String
Default:
Example:

This option specifies the default “alt=” text Tidy uses for <IMG>
attributes. This feature is dangerous as it suppresses further
accessibility warnings. You are responsible for making your documents
accessible to people who can not see the images!

 

anchor-as-name

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option controls the deletion or addition of the name attribute in
elements where it can serve as anchor. If set to “yes”, a name
attribute, if not already existing, is added along an existing id
attribute if the DTD allows it. If set to “no”, any existing name
attribute is removed if an id attribute exists or has been added.

 

assume-xml-procins

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should change the parsing of processing
instructions to require ?> as the terminator rather than >. This
option is automatically set if the input is in XML.

 

bare

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should strip Microsoft specific HTML from
Word 2000 documents, and output spaces rather than non-breaking spaces
where they exist in the input.

 

clean

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

drop-font-tags

This option specifies if Tidy should strip out surplus presentational
tags and attributes replacing them by style rules and structural markup
as appropriate. It works well on the HTML saved by Microsoft Office
products.

 

css-prefix

Top

Type: String
Default:
Example:

This option specifies the prefix that Tidy uses for styles rules. By
default, “c” will be used.

 

decorate-inferred-ul

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should decorate inferred UL elements with
some CSS markup to avoid indentation to the right.

 

doctype

Top

Type: DocType
Default: auto
Example: omit, auto, strict, transitional, user

This option specifies the DOCTYPE declaration generated by Tidy. If set
to “omit” the output won’t contain a DOCTYPE declaration. If set to
“auto” (the default) Tidy will use an educated guess based upon the
contents of the document. If set to “strict”, Tidy will set the DOCTYPE
to the strict DTD. If set to “loose”, the DOCTYPE is set to the loose
(transitional) DTD. Alternatively, you can supply a string for the
formal public identifier (FPI).

For example:
doctype: “-//ACME//DTD HTML 3.14159//EN”

If you specify the FPI for an XHTML document, Tidy will set the system
identifier to an empty string. For an HTML document, Tidy adds a system
identifier only if one was already present in order to preserve the
processing mode of some browsers. Tidy leaves the DOCTYPE for generic
XML documents unchanged. --doctype omit implies
--numeric-entities yes. This option does not offer a validation of the
document conformance.

 

drop-empty-paras

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should discard empty paragraphs.

 

drop-font-tags

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

clean

This option specifies if Tidy should discard <FONT> and
<CENTER> tags without creating the corresponding style rules. This
option can be set independently of the clean option.

 

drop-proprietary-attributes

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should strip out proprietary attributes,
such as MS data binding attributes.

 

enclose-block-text

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should insert a <P> element to
enclose any text it finds in any element that allows mixed content for
HTML transitional but not HTML strict.

 

enclose-text

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should enclose any text it finds in the
body element within a <P> element. This is useful when you want to
take existing HTML and use it with a style sheet.

 

escape-cdata

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should convert <![CDATA[]]>
sections to normal text.

 

fix-backslash

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should replace backslash characters “\
in URLs by forward slashes “/“.

 

fix-bad-comments

Top

Type: Boolean
Default: yes
Example: PHP,y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should replace unexpected hyphens with “=”
characters when it comes across adjacent hyphens. The default is yes.
This option is provided for users of Cold Fusion which uses the comment
syntax: <!— —>

 

fix-uri

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should check attribute values that carry
URIs for illegal characters and if such are found, escape them as HTML 4
recommends.

 

hide-comments

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should print out comments.

 

hide-endtags

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should omit optional end-tags when
generating the pretty printed markup. This option is ignored if you are
outputting to XML.

 

indent-cdata

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should indent <![CDATA[]]>
sections.

 

input-xml

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should use the XML parser rather than the
error correcting HTML parser.

 

join-classes

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

join-styles
repeated-attributes

This option specifies if Tidy should combine class names to generate a
single new class name, if multiple class assignments are detected on an
element.

 

join-styles

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

join-classes
repeated-attributes

This option specifies if Tidy should combine styles to generate a single
new style, if multiple style values are detected on an element.

 

literal-attributes

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should ensure that whitespace characters
within attribute values are passed through unchanged.

 

logical-emphasis

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should replace any occurrence of <I>
by <EM> and any occurrence of <B> by <STRONG>. In both
cases, the attributes are preserved unchanged. This option can be set
independently of the clean and drop-font-tags options.

 

lower-literals

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should convert the value of an attribute
that takes a list of predefined values to lower case. This is required
for XHTML documents.

 

merge-divs

Top

Type: AutoBool
Default: auto
Example: auto, y/n, yes/no, t/f, true/false, 1/0

clean
merge-spans

Can be used to modify behavior of -c (–clean yes) option. This option
specifies if Tidy should merge nested <div> such as
“<div><div>…</div></div>”. If set to “auto”,
the attributes of the inner <div> are moved to the outer one. As
well, nested <div> with ID attributes are not merged. If set to
“yes”, the attributes of the inner <div> are discarded with the
exception of “class” and “style”.

 

merge-spans

Top

Type: AutoBool
Default: auto
Example: auto, y/n, yes/no, t/f, true/false, 1/0

clean
merge-divs

Can be used to modify behavior of -c (–clean yes) option. This option
specifies if Tidy should merge nested <span> such as
“<span><span>…</span></span>”. The algorithm
is identical to the one used by –merge-divs.

 

ncr

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should allow numeric character references.

 

new-blocklevel-tags

Top

Type: Tag names
Default:
Example: tagX, tagY, …

new-empty-tags
new-inline-tags
new-pre-tags

This option specifies new block-level tags. This option takes a space or
comma separated list of tag names. Unless you declare new tags, Tidy
will refuse to generate a tidied file if the input includes previously
unknown tags. Note you can’t change the content model for elements such
as <TABLE>, <UL>, <OL> and <DL>. This option is
ignored in XML mode.

 

new-empty-tags

Top

Type: Tag names
Default:
Example: tagX, tagY, …

new-blocklevel-tags
new-inline-tags
new-pre-tags

This option specifies new empty inline tags. This option takes a space
or comma separated list of tag names. Unless you declare new tags, Tidy
will refuse to generate a tidied file if the input includes previously
unknown tags. Remember to also declare empty tags as either inline or
blocklevel. This option is ignored in XML mode.

 

new-inline-tags

Top

Type: Tag names
Default:
Example: tagX, tagY, …

new-blocklevel-tags
new-empty-tags
new-pre-tags

This option specifies new non-empty inline tags. This option takes a
space or comma separated list of tag names. Unless you declare new tags,
Tidy will refuse to generate a tidied file if the input includes
previously unknown tags. This option is ignored in XML mode.

 

new-pre-tags

Top

Type: Tag names
Default:
Example: tagX, tagY, …

new-blocklevel-tags
new-empty-tags
new-inline-tags

This option specifies new tags that are to be processed in exactly the
same way as HTML’s <PRE> element. This option takes a space or
comma separated list of tag names. Unless you declare new tags, Tidy
will refuse to generate a tidied file if the input includes previously
unknown tags. Note you can not as yet add new CDATA elements (similar to
<SCRIPT>). This option is ignored in XML mode.

 

numeric-entities

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

doctype
preserve-entities

This option specifies if Tidy should output entities other than the
built-in HTML entities (&, <, > and ") in the numeric
rather than the named entity form. Only entities compatible with the
DOCTYPE declaration generated are used. Entities that can be represented
in the output encoding are translated correspondingly.

 

output-html

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should generate pretty printed output,
writing it as HTML.

 

output-xhtml

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should generate pretty printed output,
writing it as extensible HTML. This option causes Tidy to set the
DOCTYPE and default namespace as appropriate to XHTML. If a DOCTYPE or
namespace is given they will checked for consistency with the content of
the document. In the case of an inconsistency, the corrected values will
appear in the output. For XHTML, entities can be written as named or
numeric entities according to the setting of the “numeric-entities”
option. The original case of tags and attributes will be preserved,
regardless of other options.

 

output-xml

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should pretty print output, writing it as
well-formed XML. Any entities not defined in XML 1.0 will be written as
numeric entities to allow them to be parsed by a XML parser. The
original case of tags and attributes will be preserved, regardless of
other options.

 

preserve-entities

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should preserve the well-formed entitites
as found in the input.

 

quote-ampersand

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should output unadorned & characters as
&.

 

quote-marks

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should output ” characters as " as is
preferred by some editing environments. The apostrophe character ‘ is
written out as ' since many web browsers don’t yet support '.

 

quote-nbsp

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should output non-breaking space
characters as entities, rather than as the Unicode character value 160
(decimal).

 

repeated-attributes

Top

Type: enum
Default: keep-last
Example: keep-first, keep-last

join-classes
join-styles

This option specifies if Tidy should keep the first or last attribute,
if an attribute is repeated, e.g. has two align attributes.

 

replace-color

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should replace numeric values in color
attributes by HTML/XHTML color names where defined, e.g. replace
“#ffffff” with “white”.

 

show-body-only

Top

Type: AutoBool
Default: no
Example: auto, y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should print only the contents of the body
tag as an HTML fragment. If set to “auto”, this is performed only if the
body tag has been inferred. Useful for incorporating existing whole
pages as a portion of another page. This option has no effect if XML
output is requested.

 

uppercase-attributes

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should output attribute names in upper
case. The default is no, which results in lower case attribute names,
except for XML input, where the original case is preserved.

 

uppercase-tags

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should output tag names in upper case. The
default is no, which results in lower case tag names, except for XML
input, where the original case is preserved.

 

word-2000

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should go to great pains to strip out all
the surplus stuff Microsoft Word 2000 inserts when you save Word
documents as “Web pages”. Doesn’t handle embedded images or VML. You
should consider using Word’s “Save As: Web Page, Filtered”.

 

 

Diagnostics Options Reference

 

accessibility-check

Top

Type: enum
Default: 0 (Tidy Classic)
Example: 0 (Tidy Classic), 1 (Priority 1 Checks), 2 (Priority 2
Checks), 3 (Priority 3 Checks)

This option specifies what level of accessibility checking, if any, that
Tidy should do. Level 0 is equivalent to Tidy Classic’s accessibility
checking. For more information on Tidy’s accessibility checking, visit
the Adaptive Technology Resource Centre at the University of
Toronto
.

 

show-errors

Top

Type: Integer
Default: 6
Example: 0, 1, 2, …

This option specifies the number Tidy uses to determine if further
errors should be shown. If set to 0, then no errors are shown.

 

show-warnings

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should suppress warnings. This can be
useful when a few errors are hidden in a flurry of warnings.

 

 

Pretty Print Options Reference

 

break-before-br

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should output a line break before each
<BR> element.

 

indent

Top

Type: AutoBool
Default: no
Example: auto, y/n, yes/no, t/f, true/false, 1/0

indent-spaces

This option specifies if Tidy should indent block-level tags. If set to
“auto”, this option causes Tidy to decide whether or not to indent the
content of tags such as TITLE, H1-H6, LI, TD, TD, or P depending on
whether or not the content includes a block-level element. You are
advised to avoid setting indent to yes as this can expose layout bugs in
some browsers.

 

indent-attributes

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should begin each attribute on a new line.

 

indent-spaces

Top

Type: Integer
Default: 2
Example: 0, 1, 2, …

indent

This option specifies the number of spaces Tidy uses to indent content,
when indentation is enabled.

 

markup

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should generate a pretty printed version
of the markup. Note that Tidy won’t generate a pretty printed version if
it finds significant errors (see force-output).

 

punctuation-wrap

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should line wrap after some Unicode or
Chinese punctuation characters.

 

sort-attributes

Top

Type: enum
Default: none
Example: none, alpha

This option specifies that tidy should sort attributes within an element
using the specified sort algorithm. If set to “alpha”, the algorithm is
an ascending alphabetic sort.

 

split

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

Currently not used. Tidy Classic only.

 

tab-size

Top

Type: Integer
Default: 8
Example: 0, 1, 2, …

This option specifies the number of columns that Tidy uses between
successive tab stops. It is used to map tabs to spaces when reading the
input. Tidy never outputs tabs.

 

vertical-space

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should add some empty lines for
readability.

 

wrap

Top

Type: Integer
Default: 68
Example: 0 (no wrapping), 1, 2, …

This option specifies the right margin Tidy uses for line wrapping. Tidy
tries to wrap lines so that they do not exceed this length. Set wrap to
zero if you want to disable line wrapping.

 

wrap-asp

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should line wrap text contained within ASP
pseudo elements, which look like: <% … %>.

 

wrap-attributes

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

wrap-script-literals

This option specifies if Tidy should line wrap attribute values, for
easier editing. This option can be set independently of
wrap-script-literals.

 

wrap-jste

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should line wrap text contained within
JSTE pseudo elements, which look like: <# … #>.

 

wrap-php

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should line wrap text contained within PHP
pseudo elements, which look like: <?php … ?>.

 

wrap-script-literals

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

wrap-attributes

This option specifies if Tidy should line wrap string literals that
appear in script attributes. Tidy wraps long script string literals by
inserting a backslash character before the line break.

 

wrap-sections

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should line wrap text contained within
<![ … ]> section tags.

 

 

Character Encoding Options Reference

 

ascii-chars

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

clean

Can be used to modify behavior of -c (–clean yes) option. If set to
“yes” when using -c, &emdash;, ”, and other named character
entities are downgraded to their closest ascii equivalents.

 

char-encoding

Top

Type: Encoding
Default: ascii
Example: raw, ascii, latin0, latin1, utf8, iso2022, mac, win1252,
ibm858, utf16le, utf16be, utf16, big5, shiftjis

input-encoding
output-encoding

This option specifies the character encoding Tidy uses for both the
input and output. For ascii, Tidy will accept Latin-1 (ISO-8859-1)
character values, but will use entities for all characters whose value
> 127. For raw, Tidy will output values above 127 without translating
them into entities. For latin1, characters above 255 will be written as
entities. For utf8, Tidy assumes that both input and output is encoded
as UTF-8. You can use iso2022 for files encoded using the ISO-2022
family of encodings e.g. ISO-2022-JP. For mac and win1252, Tidy will
accept vendor specific character values, but will use entities for all
characters whose value > 127. For unsupported encodings, use an
external utility to convert to and from UTF-8.

 

input-encoding

Top

Type: Encoding
Default: latin1
Example: raw, ascii, latin0, latin1, utf8, iso2022, mac, win1252,
ibm858, utf16le, utf16be, utf16, big5, shiftjis

char-encoding

This option specifies the character encoding Tidy uses for the input.
See char-encoding for more info.

 

language

Top

Type: String
Default:
Example:

Currently not used, but this option specifies the language Tidy uses
(for instance “en”).

 

newline

Top

Type: enum
Default: Platform dependent
Example: LF, CRLF, CR

The default is appropriate to the current platform: CRLF on PC-DOS,
MS-Windows and OS/2, CR on Classic Mac OS, and LF everywhere else (Unix
and Linux).

 

output-bom

Top

Type: AutoBool
Default: auto
Example: auto, y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should write a Unicode Byte Order Mark
character (BOM; also known as Zero Width No-Break Space; has value of
U+FEFF) to the beginning of the output; only for UTF-8 and UTF-16 output
encodings. If set to “auto”, this option causes Tidy to write a BOM to
the output only if a BOM was present at the beginning of the input. A
BOM is always written for XML/XHTML output using UTF-16 output
encodings.

 

output-encoding

Top

Type: Encoding
Default: ascii
Example: raw, ascii, latin0, latin1, utf8, iso2022, mac, win1252,
ibm858, utf16le, utf16be, utf16, big5, shiftjis

char-encoding

This option specifies the character encoding Tidy uses for the output.
See char-encoding for more info. May only be different from
input-encoding for Latin encodings (ascii, latin0, latin1, mac, win1252,
ibm858).

 

 

Miscellaneous Options Reference

 

error-file

Top

Type: String
Default:
Example:

output-file

This option specifies the error file Tidy uses for errors and warnings.
Normally errors and warnings are output to “stderr”.

 

force-output

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should produce output even if errors are
encountered. Use this option with care – if Tidy reports an error, this
means Tidy was not able to, or is not sure how to, fix the error, so the
resulting output may not reflect your intention.

 

gnu-emacs

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should change the format for reporting
errors and warnings to a format that is more easily parsed by GNU Emacs.

 

gnu-emacs-file

Top

Type: String
Default:
Example:

Used internally.

 

keep-time

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should keep the original modification time
of files that Tidy modifies in place. The default is no. Setting the
option to yes allows you to tidy files without causing these files to be
uploaded to a web server when using a tool such as SiteCopy. Note this
feature is not supported on some platforms.

 

output-file

Top

Type: String
Default:
Example:

error-file

This option specifies the output file Tidy uses for markup. Normally
markup is written to “stdout”.

 

quiet

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should output the summary of the numbers
of errors and warnings, or the welcome or informational messages.

 

slide-style

Top

Type: String
Default:
Example:

Currently not used. Tidy Classic only.

 

tidy-mark

Top

Type: Boolean
Default: yes
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should add a meta element to the document
head to indicate that the document has been tidied. Tidy won’t add a
meta element if one is already present.

 

write-back

Top

Type: Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This option specifies if Tidy should write back the tidied markup to the
same file it read from. You are advised to keep copies of important
files before tidying them, as on rare occasions the result may not be
what you expect.

 

 

相关文章