Text Corpus in Collaboration - A balance between customized and standardized approach.
The Text Encoding Initiative (TEI) compliant Extensible Markup Language (XML) is the predominant standard for creating, encoding and managing digital textual data in the field of Digital Humanities (DH). The Collaborative Research Center (CRC) 1475, "Metaphors of Religion," leverages this robust and flexible standard to develop a shared infrastructure that facilitates metaphor analysis across diverse religious traditions, languages and time periods ranging from 2,000 BCE to the present day. While DH projects often begin with an emphasis on data reusability, selecting widely accepted standards and licenses accordingly, many research contexts instead start from a scholarly interest, requiring the integration of heterogeneous sources not originally intended for reuse. The information infrastructure subproject (INF) utilizes TEI-XML to harmonize and integrate texts from various subprojects, accommodating variations in language, editorial processes, and file formats within a flexible yet standardized schema. This method establishes a core framework that ensures structural and semantic consistency while allowing extensions tailored to specific subproject needs. Such a balanced and scalable approach supports comprehensive metaphor analysis, maintaining uniformity across the collaborative environment while accommodating specialized requirements. This paper explores the development of this framework, emphasizing how it effectively bridges the gap between standardization and customization in digital textual corpus.