How does XML import work?

Importing an XML artifact involves extracting the contents of an artifact which have been earlier exported to an XML file. When you import the XML file, the content present in the form of artifacts is imported into the project and is re-arranged according to the structure present.

Import type

There are two ways that the import can happen and this is defined by the <import-type> element in the XML file.

create-same-db aka create-different-db - This means that when doing an import, all artifacts being imported will be considered new artifacts, any pre-existing data will be ignored and all data in the XML file will be imported as new data.

This import-type is used for:


update-same-db - This means that the system will try to find existing artifacts in the database based on the <id> element. <issue>'s representing existing artifacts will lead to modifications to the existing artifacts. The same XML might also contain <issue>'s with <id> elements not matching existing artifacts; these will be imported as new artifacts.

The use case for the existing update implementation is that it support a particular client's use of PT as a facilitor in moving artifact data between various external issue trackers. The details of this usage are not known at this time.

Since the primary use of update-same-db is not clear, the development has followed a path assuming it might be usable as a poor substitute for web service api's. Though scripting using the CGI parameters would be about as intuitive. The following cases might work to varying degrees with the current implementation:


Prerequisites

The current pre-requirements for a successful import are that all of the projects, usernames, artifact types, attributes, attribute options referenced in the XML file must already exist in the system. If they cannot be found, the import will fail.

User interface

The input to "XML import" is a single XML file whose details are specified by the DTD. The DTD contains some descriptive comments, but is further explained here. The file may be introduced via HTTP file upload or placed on the server's filesystem as part of procedures agreed upon between the Services and Operations groups.

HTTP file upload

The file upload option is available to users with permission to enter artifacts within a project. However, the link to access the functionality which appears as an Administration navigation item is only shown to users with Project Issue Tracking - Configure or Project - Edit permissions which usually means Project owner and other administrative roles.

Results of the import are presented via the HTTP response or if the time is longer than a preset limit (currently 30 seconds), the results are emailed to the user. Error results are similar whether immediate response or by email. For email the success message contains a link to view the results through the web interface. If the import is scripted it is possible to add the query parameter: format=xml to have results returned as an XML document. If this parameter value exists, the response will wait until the import is finished. There is no time limit after which an email will be sent; however the HTTP client or server may impose maximum time limits.

File attachment elements are allowed within the XML but will be ignored. For update-same-db type a warning may be given.

Command line (on the server)

XML import via the filesystem is used when converting projects from Issuezilla to PT and is also made available to clients through Services. File attachment elements are allowed within the XML but must point to files that exist on the server. Services must scrub any client provided XML prior to import to prevent access to sensitive files on the server.

It is possible to specify a directory containing several XML files which are to be imported. This is a convenience and the files are treated as independent imports.

A successful import with no warnings results in no messages. Errors and warnings are printed to standard out.

Validation

Validation can include checking the XML against a DTD. The data contained in the XML file will also be compared against the state of the application.

XML validation

If a DOCTYPE tag is given in the XML file, it must be specified as <!DOCTYPE Tracker-issues SYSTEM "PROTO://www.DOMAIN/dtd/pt.dtd"> for releases 3.0.1 and later (where PROTO is https or http), or <!DOCTYPE Tracker-issues SYSTEM "http://project.domain.com/dtd/scarab-0.16.29.dtd"> for releases 3.0.0 and 2.6.x. If the above DOCTYPE tag is given, the XML will be validated against the current DTD for the release.

Semantic validation

The data contained within the XML file is checked against the current metadata within the PT database. It is required that all metadata such artifact types and attributes be available as a prerequisite. Attributes which are given values in the XML must be associated with the artifact type for the given artifact though they may be inactive. An attribute name in the XML must map to the global attribute name. Attribute options may use the global name or the alias for the option used in the project and artifact type.

Validation of dependencies involves checking that the type and artifacts involved exist. See the Comparison of create and update types section for some details on information that is not validated.

Date strings given in the XML must be parsable according to the format string supplied with the date.

Usernames used in the XML must map to actual users within the domain (or host for host users).

File attachments must contain a valid path to a file on the server. This is only validated, if file attachments are allowed.

All the data is validated prior to start of import and changes to the metadata between the validation step and the end of the import could result in import failures leading to partially imported artifacts.

Note: Required attributes are ignored when importing attributes using the XML Import feature. That is, when creating a new artifact through the CollabNet user interface, you must supply a value for all the attributes that are marked as required ( indicated by an "*"). However, the system allows new artifacts created through an XML import to be created even if values are missing for required attributes.

Attribute dependencies

This is disabled in the importing thread. Audit information is unavailable for transitions and required attributes, making it impossible to know if importing an artifact that includes activity at some point in the past was legal. Even if Project Tracker maintained an audit of this information, the XML import feature can import data from a legacy issue tracker, and the activity on the artifact may predate the changes to metadata within Project Tracker. It is assumed that the data met the requirements for the state it was in at any given time.

An artifact can be imported without required attribute values. The required values will be enforced when the artifact is first edited in Project Tracker.

Artifact dependencies

Dependencies can be specified between artifacts contained within the same XML file. You cannot declare dependencies between artifacts in separate XML files being imported. For the create-same-db type you cannot specify dependencies between artifacts in the XML and artifacts already entered into PT. It is possible to specify dependencies to already entered artifacts using the update-same-db type.

Activity-sets

Activities can be grouped as happening simultaneously, but some activities are not normally grouped with others and so are specified as an activity-set containing a single activity:


Several activities altering attribute values and dependencies may be contained within a single activity-set. But you should not have an activity entry: such as "Status changed from NEW to STARTED" and another with "Status changed from NEW to FIXED" within the same activity-set. Such a combination will not be caught by validation.

Created and modified date information is ignored for attachment elements which include comments/reasons, urls, and files. File attachments are ignored when the XML is uploaded via the web. Activities with attachments that contain the same id as a previous attachment within the XML will lead to another history record for the initial attachment, but will not cause any modifications to the original attachment. Therefore the first reference to the attachment within the XML must contain the final state, the other references are generally the same, but are ignored regardless, as long as they share the same ID.

old-value and new-value elements within an activity are used to add or modify attribute values. Single valued attributes may contain both an old and new value which is interpreted as changing the value. Multi-valued attributes should have only one or the other element. The assignee is not changed from "user1" to "user2"; "user1" is removed as an assignee, and 'user2' is added as an assignee. Including the option or username as the old-value, and nothing for the new-value is interpreted as removing the value.

Sample XML

Examples of several modifications showing the more relevant sections of the XML are available in the Activity signatures document. A few examples are also available:


Comparison of create and update types

Validation

Dependencies in the XML can reference artifacts already entered into the system for the "update" type. "Create" type requires self-consistency among the dependencies and artifacts found in the XML.

New artifacts

The "create" type always results in newly saved artifacts for each issue element in the XML. The id element is compared against saved artifacts when using the "update" type. If the ID matches a saved artifact, that artifact will be modified; otherwise a new artifact will be created similar to the 'create' type. Newly saved artifacts will be assigned new ID's according to the prefix and counter in use by the project.

Activity-sets

In the "create" type, activity-sets' dates are used as specified in the XML input. For an update-same-db import, if an artifact is being updated, the created-date for an activity-set is compared both to the last modified date for the artifact and to the current time. Activity-sets prior to the last modified date are ignored and a warning is given. Dates in the future are converted to the time the import is being done; this provides a way to specify the update to use the current time, within the constraints of the DTD.

Activity

An activity with a null end-date will result in new activity for either type. Normally a null end-date for an attribute related activity means the activity represents the current value value for the attribute. Activities with end-date values are ignored for the "update" import type.

Dependencies

Attempts to add dependencies that are already present or delete dependencies that do not exist are silently ignored for the "update" type. Doing the same for the "create" type will cause an exception with the result of not saving that dependency and any later occuring dependencies. All the artifacts will have been saved though. Updating a dependency that does not exist is ignored for both types.