XML Workflow Advantages: Sub-Setting the JATS DTD
The National Information Standards Organization (NISO) held their second annual conference, NISO Plus 2021, on February 22-25. Aries Systems is pleased to have sponsored the virtual meeting, which included 850 participants from 26 countries around the world. Aries Business Systems Analyst, Charles O’Connor, presented “Sub-setting the JATS DTD – So What?” during the Solving Problems with Standards session on Tuesday, February 23rd. Aries’ LiXuid Manuscript™ philosophy embodies our vision for the future of scholarly publishing, in which XML (eXtensible Markup Language) is leveraged to streamline the entire publishing process. To make this vision a reality, Aries is harnessing the power of JATS and the advantages of sub-setting the DTD.
The Journal Article Tag Suite (JATS) is an XML format used to describe the textual content of scholarly published works, such as research articles. It was first developed by the National Library of Medicine in 2003, but has since grown beyond biomedical information to be used for STM, social science, and humanities content. Considered a de facto standard in the community for many years, JATS is now a technical standard, maintained by NISO since 2012. As scholarly publishers transition from manual, PDF-based workflows to automated, XML-based workflows, they may find important advantages in sub-setting the JATS DTD.
By design, JATS is a descriptive DTD, not prescriptive, allowing for several different ways to capture the same content and information. JATS supports eleven methods to associate Authors and affiliations, has two distinct bibliographic reference models, and varying publication history collection methods (e.g. <history> or <pub-history>?). While this was necessary to accommodate widely divergent journal styles and legacy content, the looseness of the JATS DTD poses problems for those building tools to bring XML forward in more automated publishing workflows, making it unnecessarily complex and expensive for publishers to develop and maintain.
Fortunately, the JATS DTD was also designed to be easily sub-setted. It is modular, so publishers don’t need to touch common files – simply create a set of override content models so changes can be made in high-level files without the need to update each content module in every file. Content analysts can narrow the variations that developers are required to build to, making automated systems cheaper to develop and more robust. In addition, sub-setting provides predictable content for rendering and transforming and also provides clear expectations for suppliers and other partners, which is helpful even if you aren’t building XML tools. A well-designed subset that considers industry initiatives such as JATS for Reuse (JATS4R) also aids in making XML content more machine-readable and thus more discoverable.
There are also some things to avoid when sub-setting the DTD. First and foremost, do not remove things that are mandatory in the parent DTD, otherwise your subset won’t be valid to the parent DTD. Secondly, do not delete anything unless there is a specific “technical win” to be had – be thorough in your reasoning for removal. Lastly, it is good practice to document everything, as you will need clear documentation to support your subset of the DTD. For anyone interested in JATS customization, we recommend they read the JATS compatibility meta-model, which provides key insights into how JATS is structured.
There are many benefits associated with utilizing JATS XML early in the editorial workflow and throughout the production workflow, including reduced costs and time to publication. The article content is more amenable to automated analysis (such as AI analysis or QA with Schematron) if in a structured XML format, as opposed to a Word or PDF document. But there are challenges as well in building and maintaining the toolset required to support XML workflows. The solution to many of these challenges is to subset the JATS DTD.
For more information, check out the full presentation on Subsetting the JATS DTD.