Интерактивная система просмотра системных руководств (man-ов)
dictfmt (1)
>> dictfmt (1) ( Linux man: Команды и прикладные программы пользовательского уровня )
NAME
dictfmt - formats a DICT protocol dictionary database
SYNOPSIS
dictfmt -c5|-t|-e|-f|-h|-j|-p [options] basename
DESCRIPTION
dictfmt
takes a file,
FILE,
on stdin, and creates a dictionary database named
basename.dict,
that conforms to the DICT protocol. It also creates an index file named
basename.index.
By default, the index is sorted according to the
C locale, and only alphanumeric characters and spaces are used in
sorting, however this may be changed with the --locale and --allchars
options. (
basename is commonly chosen to correspond to the basename of
FILE
, but this is not mandatory.)
Unless the database is extremely small, it is
highly recommended that
basename.dict
be compressed with
/usr/bin/dictzip
to create
basename.dict.dz.
(dictzip is included in
the
dictd
source package.)
FILE may be in any of the several formats described by
the format options -c5, -t, -e, -f, -h, -j, or -p. Exactly one of
these options must be given.
dictfmt
prepends several headers are to the .dict file. The 00-database-url
header gives the value of the -u option as the URL of the site from
which the original database was obtained. The 00-database-short
header gives the value of the -s option as the short name of the
dictionary. (This "short name" is the identifying name given by the
"dict- D" option.) If the -u and/or -s options are omitted, these
values will be shown as "unknown", which is undesirable for a publicly
distributed database.
The date of conversion (formatting) is given in the 00-database-info
header. All text in the input file prior to the first headword is
appended to this header. All text in the input file following a
headword, up to the next headword, is copied unchanged to the .dict
file.
FORMATTING OPTIONS
-c5
FILE
is formatted with
headwords
preceded by 5 or more underscore characters (_) and a blank line.
All text until the next
headword
is considered the definition. Any leading `@'
characters are stripped out, but the file is otherwise unchanged. This
option was written to format the CIA WORLD FACTBOOK 1995.
-t
-c5, --without-info and --without-headword options are implied.
Use this option, if an input database comes from
dictunformat
utility.
-e
FILE
is in html format, with the
headword
tagged as bold. (<B>headword - </B>)
This option was written to format EASTON'S 1897 BIBLE DICTIONARY. A
typical entry from Easton is:
<A NAME="T0000005">
<B>Abagtha - </B>
one of the seven eunuchs in Ahasuerus's court (Esther 1:10;
2:21).
This is converted to:
Abagtha
one of the seven eunuchs in Ahasuerus's court (Esther 1:10;
2:21).
The heading "<A NAME="T0000005"> is omitted, and the
headword
`Abagtha' is indexed.
NOTE:
This option should be used with caution. It removes several html tags
(enough to format Easton properly), but not all. The Makefile that
was originally written to format dict-easton uses sed scripts to
modify certain cross reference tags. It may be necessary to pipe the
input file through a sed script, or hack the source of dictfmt in
order to properly format other html databases.
-f
FILE
is formatted with the
headwords
starting in column 0, with the definition indented at least one space
(or tab character) on subsequent lines. The third line starting in
column 0 is taken as the first headword, and the first two lines
starting in column 0 are treated as part of the -00-database-info
header. This option was written to format the F.O.L.D.O.C.
-h
FILE
is formatted with the
headwords
starting in column 0, followed by a comma, with the definition
continuing on the same line. All text before the first single
character line is included in 00-database-info header, and lines with
only one character are omitted from the .dict file. The first
headword is on the line following the first single character line.
This option was written to format HITCHCOCK'S BIBLE
NAMES DICTIONARY. The
headword
is indexed; the text of the file is not changed.
-j
FILE
is formatted with
headwords
starting in col 0, enclosed in colons, followed by the definition.
This option was written to format the JARGON FILE. The colons
surrounding the
headword
are removed, and the
headword
is indexed. Lines beginning with '*', '=', or '-' are also removed.
All text before the first headword is included in the headers.
NOTE:
Some recent versions of the JARGON FILE had three blanks inserted
before the first colon at each headword. These must be removed before
processing with dictfmt. (sed scripts have been used for this
purpose. ed, awk, or perl scripts are also possible.)
-p
FILE
is formatted with `%h' in column 0, followed by a blank, followed by the
headword,
optionally followed by a line containing `%d' in column 0. The
definition starts on the following line. The first line beginning
'%h' and any lines beginnning '%d' are stripped from the .dict
file, and '%h ' is stripped from in front of the headword. All
text before the first headword is included in the headers.
This option was written to format Jay Kominek's elements database.
OPTIONS
-u url
Specifies the URL of the site from which the raw database was obtained.
If this option is specified, 00-database-url/00databaseurl headword and
appropriate definition will be ignored.
-s name
Specifies the name and, optionally, the version and date, of the
database. (If this contains spaces, it must be quoted.)
If this option is specified, 00-database-short/00databaseshort headword and
appropriate definition will be ignored.
-L
display license and copyright information
-V
display version information
-D
output debugging information
--help
display a help message
--locale locale
specifies the locale used for sorting. if no locale is specified, the
"C" locale is used.
--allchars
use all characters (not only alphanumeric and space) in sorting the index
--headword-separator sep
sets the head word separator, which allows several words to have the same
definition. For example, if '--headword-separator %%%' is given,
and the input file contains 'autumn%%%fall', both 'autumn' and 'fall'
will be indexed as headwords, with the same definition. This
option implies the --without-headword option.
--without-headword
head words will not be included in .dict file
--without-header
header will not be copied to DB info entry
--without-url
URL will not be copied to DB info entry
--without-time
time of creation will not be copied to DB info entry
--without-info
DB info entry will not be created.
This may be useful if 00-database-info headword
is expected from stdin (dictunformat outputs it).
CREDITS
dictfmt
was written by Rik Faith (faith@cs.unc.edu) as part of the dict-misc
package.
dictfmt
is distributed under the terms of the GNU
General Public License. If you need to distribute under other terms,
write to the author.