dcm2xml(1) OFFIS DCMTK dcm2xml(1)
NAME
dcm2xml - Convert DICOM file and data set to XML
SYNOPSIS
dcm2xml [options] dcmfile-in [xmlfile-out]
DESCRIPTION
The dcm2xml utility converts the contents of a DICOM file (file format
or raw data set) to XML (Extensible Markup Language). There are two
output formats. The first one is specific to DCMTK with its DTD
(Document Type Definition) described in the file dcm2xml.dtd. The
second one refers to the 'Native DICOM Model' which is specified for
the DICOM Application Hosting service found in DICOM part 19.
If dcm2xml reads a raw data set (DICOM data without a file format meta-
header) it will attempt to guess the transfer syntax by examining the
first few bytes of the file. It is not always possible to correctly
guess the transfer syntax and it is better to convert a data set to a
file format whenever possible (using the dcmconv utility). It is also
possible to use the -f and -t[ieb] options to force dcm2xml to read a
data set with a particular transfer syntax.
PARAMETERS
dcmfile-in DICOM input filename to be converted
xmlfile-out XML output filename (default: stdout)
OPTIONS
general options
-h --help
print this help text and exit
--version
print version information and exit
--arguments
print expanded command line arguments
-q --quiet
quiet mode, print no warnings and errors
-v --verbose
verbose mode, print processing details
-d --debug
debug mode, print debug information
-ll --log-level [l]evel: string constant
(fatal, error, warn, info, debug, trace)
use level l for the logger
-lc --log-config [f]ilename: string
use config file f for the logger
input options
input file format:
+f --read-file
read file format or data set (default)
+fo --read-file-only
read file format only
-f --read-dataset
read data set without file meta information
input transfer syntax:
-t= --read-xfer-auto
use TS recognition (default)
-td --read-xfer-detect
ignore TS specified in the file meta header
-te --read-xfer-little
read with explicit VR little endian TS
-tb --read-xfer-big
read with explicit VR big endian TS
-ti --read-xfer-implicit
read with implicit VR little endian TS
long tag values:
+M --load-all
load very long tag values (e.g. pixel data)
-M --load-short
do not load very long values (default)
+R --max-read-length [k]bytes: integer (4..4194302, default: 4)
set threshold for long values to k kbytes
processing options
specific character set:
+Cr --charset-require
require declaration of extended charset (default)
+Ca --charset-assume [c]harset: string
assume charset c if no extended charset declared
+Cc --charset-check-all
check all data elements with string values
(default: only PN, LO, LT, SH, ST, UC and UT)
# this option is only used for the mapping to an appropriate
# XML character encoding, but not for the conversion to UTF-8
+U8 --convert-to-utf8
convert all element values that are affected
by Specific Character Set (0008,0005) to UTF-8
# requires support from an underlying character encoding library
# (see output of --version on which one is available)
output options
general XML format:
-dtk --dcmtk-format
output in DCMTK-specific format (default)
-nat --native-format
output in Native DICOM Model format (part 19)
+Xn --use-xml-namespace
add XML namespace declaration to root element
DCMTK-specific format (not with --native-format):
+Xd --add-dtd-reference
add reference to document type definition (DTD)
+Xe --embed-dtd-content
embed document type definition into XML document
+Xf --use-dtd-file [f]ilename: string
use specified DTD file (only with +Xe)
(default: /usr/local/share/dcmtk/dcm2xml.dtd)
+Wn --write-element-name
write name of the DICOM data elements (default)
-Wn --no-element-name
do not write name of the DICOM data elements
+Wb --write-binary-data
write binary data of OB and OW elements
(default: off, be careful with --load-all)
encoding of binary data:
+Eh --encode-hex
encode binary data as hex numbers
(default for DCMTK-specific format)
+Eu --encode-uuid
encode binary data as a UUID reference
(default for Native DICOM Model)
+Eb --encode-base64
encode binary data as Base64 (RFC 2045, MIME)
DCMTK Format
The basic structure of the DCMTK-specific XML output created from a
DICOM file looks like the following:
166
...
OFFIS_DCMTK_353
ISO_IR 100
...
-
256 8
...
...
...
The 'file-format' and 'meta-header' tags are absent for DICOM data
sets.
XML Encoding
Attributes with very large value fields (e.g. pixel data) are not
loaded by default. They can be identified by the additional attribute
'loaded' with a value of 'no' (see example above). The command line
option --load-all forces to load all value fields including the very
long ones.
Furthermore, binary data of OB and OW attributes are not written to the
XML output file by default. These elements can be identified by the
additional attribute 'binary' with a value of 'hidden' (default is
'no'). The command line option --write-binary-data causes also binary
value fields to be printed (attribute value is 'yes' or 'base64'). But,
be careful when using this option together with --load-all because of
the large amounts of pixel data that might be printed to the output.
Please note that in this context element values with a VR of OD, OF, OL
and OV are not regarded as 'binary data'.
Multiple values (i.e. where the DICOM value multiplicity is greater
than 1) are separated by a backslash '\' (except for Base64 encoded
data). The 'len' attribute indicates the number of bytes for the
particular value field as stored in the DICOM data set, i.e. it might
deviate from the XML encoded value length e.g. because of non-
significant padding that has been removed. If this attribute is missing
in 'sequence' or 'item' start tags, the corresponding DICOM element has
been stored with undefined length.
Native DICOM Model Format
The description of the Native DICOM Model format can be found in the
DICOM standard, part 19 ('Application Hosting').
Bulk Data
Binary data, i.e. DICOM element values with Value Representations (VR)
of OB or OW, as well as OD, OF, OL, OV and UN values are by default not
written to the XML output because of their size. Instead, for each
element, a new Universally Unique Identifier (UUID) is being generated
and written as an attribute of a XML element. So far, there
is no possibility to write an additional file to hold the binary data
for each of the binary data chunks. This is not required by the
standard, however, it might be useful for implementing an Application
Hosting interface; thus this feature may be available in future
versions of dcm2xml.
In addition, Supplement 163 (Store Over the Web by Representational
State Transfer Services) introduces a new XML element
that allows for encoding binary data as Base64. Currently, the command
line option --encode-base64 enables this encoding for the following
VRs: OB, OD, OF, OL, OV, OW and UN.
Known Issues
In addition to what is written in the above section on 'Bulk Data',
there are further known issues with the current implementation of the
Native DICOM Model format. For example, large element values with a VR
other than OB, OD, OF, OL, OV, OW or UN are currently never written as
bulk data, although it might be useful, e.g. for very long text
elements (especially UT) or very long numeric fields (of various VRs).
NOTES
Character Encoding
The XML encoding is determined automatically from the DICOM attribute
(0008,0005) 'Specific Character Set' using the following mapping:
ASCII (ISO_IR 6) => "UTF-8"
UTF-8 "ISO_IR 192" => "UTF-8"
ISO Latin 1 "ISO_IR 100" => "ISO-8859-1"
ISO Latin 2 "ISO_IR 101" => "ISO-8859-2"
ISO Latin 3 "ISO_IR 109" => "ISO-8859-3"
ISO Latin 4 "ISO_IR 110" => "ISO-8859-4"
ISO Latin 5 "ISO_IR 148" => "ISO-8859-9"
Cyrillic "ISO_IR 144" => "ISO-8859-5"
Arabic "ISO_IR 127" => "ISO-8859-6"
Greek "ISO_IR 126" => "ISO-8859-7"
Hebrew "ISO_IR 138" => "ISO-8859-8"
If this DICOM attribute is missing in the input file, although needed,
option --charset-assume can be used to specify an appropriate character
set manually (using one of the DICOM defined terms). For reasons of
backward compatibility with previous versions of this tool, the
following terms are also supported and mapped automatically to the
associated DICOM defined terms: latin-1, latin-2, latin-3, latin-4,
latin-5, cyrillic, arabic, greek, hebrew.
Multiple character sets using code extension techniques are not
supported. If needed, option --convert-to-utf8 can be used to convert
the DICOM file or data set to UTF-8 encoding prior to the conversion to
XML format. This is also useful for DICOMDIR files where each directory
record can have a different character set.
If no mapping is defined and option --convert-to-utf8 is not used, non-
ASCII characters and those below #32 are stored as 'nnn;' where 'nnn'
refers to the numeric character code. This might lead to invalid
character entity references (such as '' for ESC) and will cause
most XML parsers to reject the document.
LOGGING
The level of logging output of the various command line tools and
underlying libraries can be specified by the user. By default, only
errors and warnings are written to the standard error stream. Using
option --verbose also informational messages like processing details
are reported. Option --debug can be used to get more details on the
internal activity, e.g. for debugging purposes. Other logging levels
can be selected using option --log-level. In --quiet mode only fatal
errors are reported. In such very severe error events, the application
will usually terminate. For more details on the different logging
levels, see documentation of module 'oflog'.
In case the logging output should be written to file (optionally with
logfile rotation), to syslog (Unix) or the event log (Windows) option
--log-config can be used. This configuration file also allows for
directing only certain messages to a particular output stream and for
filtering certain messages based on the module or application where
they are generated. An example configuration file is provided in
/logger.cfg.
COMMAND LINE
All command line tools use the following notation for parameters:
square brackets enclose optional values (0-1), three trailing dots
indicate that multiple values are allowed (1-n), a combination of both
means 0 to n values.
Command line options are distinguished from parameters by a leading '+'
or '-' sign, respectively. Usually, order and position of command line
options are arbitrary (i.e. they can appear anywhere). However, if
options are mutually exclusive the rightmost appearance is used. This
behavior conforms to the standard evaluation rules of common Unix
shells.
In addition, one or more command files can be specified using an '@'
sign as a prefix to the filename (e.g. @command.txt). Such a command
argument is replaced by the content of the corresponding text file
(multiple whitespaces are treated as a single separator unless they
appear between two quotation marks) prior to any further evaluation.
Please note that a command file cannot contain another command file.
This simple but effective approach allows one to summarize common
combinations of options/parameters and avoids longish and confusing
command lines (an example is provided in file /dumppat.txt).
ENVIRONMENT
The dcm2xml utility will attempt to load DICOM data dictionaries
specified in the DCMDICTPATH environment variable. By default, i.e. if
the DCMDICTPATH environment variable is not set, the file
/dicom.dic will be loaded unless the dictionary is built into
the application (default for Windows).
The default behavior should be preferred and the DCMDICTPATH
environment variable only used when alternative data dictionaries are
required. The DCMDICTPATH environment variable has the same format as
the Unix shell PATH variable in that a colon (':') separates entries.
On Windows systems, a semicolon (';') is used as a separator. The data
dictionary code will attempt to load each file specified in the
DCMDICTPATH environment variable. It is an error if no data dictionary
can be loaded.
FILES
/dcm2xml.dtd - Document Type Definition (DTD) file
SEE ALSO
xml2dcm(1), dcmconv(1)
COPYRIGHT
Copyright (C) 2002-2022 by OFFIS e.V., Escherweg 2, 26121 Oldenburg,
Germany.
Version 3.6.7 Fri Apr 22 2022 dcm2xml(1)