KULA
Knowledge Creation, Dissemination, and Preservation Studies
Volume 6, Number 3, 2022 Metadata as Knowledge Guest-edited by Stacy Allison-Cassin and Dean Seeman
Table of contents (14 articles)
Introduction
-
Metadata as Knowledge
Stacy Allison-Cassin and Dean Seeman
pp. 1–4
AbstractEN:
Introduction to "Metadata as Knowledge," a special issue of KULA: Knowledge Creation, Dissemination, and Preservation Studies that takes up the critical relationship between metadata and knowledge. The issue includes articles and project reports that address metadata, hidden knowledge, and labour; standards versus expression; knowledge sharing and reuse of metadata; forays into open and shared knowledge; linked data, metadata translation, and discovery; and machine learning and knowledge graphs. Although rarely an object of notice or scrutiny by its users, metadata governs the circulation of information and has the power to name, broadcast, normalize, oppress, and exclude. As the contributions to this issue demonstrate, metadata is knowledge, and metadata creators, systems, and practices must contend with how metadata means.
Research Articles
-
Knowledge Lost, Knowledge Gained: The Implications of Migrating to Online Archival Descriptive Systems
Daniela Ansovini, Kelli Babcock, Tanis Franco, Jiyun Alex Jung, Karen Suurtamm and Alexandra Wong
pp. 1–19
AbstractEN:
Migrating archival description from paper-based finding aids to structured online data reconfigures the dynamics of archival representation and interactions. This paper considers the knowledge implications of transferring traditional finding aids to Discover Archives, a university-wide implementation of Access to Memory (AtoM) at the University of Toronto. The migration and translation of varied descriptive practices to conform to a single system that is accessible to anyone, anywhere, effectively shifts both where and how users interface with archives and their material. This paper reflects on how different sets of knowledge are reorganized in these shifts. Discover Archives empowers researchers to do independent searches using the full breadth of their domain expertise, seemingly unbound from archival gatekeeping. At the same time, these searches are performed in the absence of archivists' unstructured mediation, where searches benefit from human interaction and the kinds of knowledges that reference staff draw on to handle complex reference questions, especially those from novice archival users. We explore the extent to which that lost knowledge can be drawn back into archival interactions via rich metadata that documents contexts and relationships embedded within Discover Archives and beyond. Internal user experience design (UXD) research on Discover Archives highlights a gap between current online description and habitual user expectations in web search and discovery. To help bridge this gap, we contributed to broader discovery nodes such as linked open "context hubs" like Wikipedia and Wikidata, which can supplement hierarchical description with linked metadata and visualization capabilities. These can reintroduce rhizomatic and serendipitous connections, enabled by archivist, researcher, and larger sets of community knowledges, to the benefit of both the user and the archivist.
-
Working Knowledge: Catalogers and the Stories They Tell
Amanda Belantara and Emily Drabinski
pp. 1–10
AbstractEN:
Cataloging librarians make myriad choices every day as they create the metadata necessary for information retrieval. Each record represents an interaction between the cataloger and the systems they work within and, sometimes, against. Their work is highly constrained by standardized machine-readable fields and codes, controlled subject terms and classification schema. In the exploratory research project Catalogers at Work, the authors use sound recording to reveal the complex yet hidden negotiations embedded in library catalog records.
-
Re-purposing Excavation Database Content as Paradata: An Explorative Analysis of Paradata Identification Challenges and Opportunities
Lisa Börjesson, Olle Sköld, Zanna Friberg, Daniel Löwenborg, Gísli Pálsson and Isto Huvila
pp. 1–18
AbstractEN:
Although data reusers request information about how research data was created and curated, this information is often non-existent or only briefly covered in data descriptions. The need for such contextual information is particularly critical in fields like archaeology, where old legacy data created during different time periods and through varying methodological framings and fieldwork documentation practices retains its value as an important information source. This article explores the presence of contextual information in archaeological data with a specific focus on data provenance and processing information, i.e., paradata. The purpose of the article is to identify and explicate types of paradata in field observation documentation. The method used is an explorative close reading of field data from an archaeological excavation enriched with geographical metadata. The analysis covers technical and epistemological challenges and opportunities in paradata identification, and discusses the possibility of using identified paradata in data descriptions and for data reliability assessments. Results show that it is possible to identify both knowledge organisation paradata (KOP) relating to data structuring and knowledge-making paradata (KMP) relating to fieldwork methods and interpretative processes. However, while the data contains many traces of the research process, there is an uneven and, in some categories, low level of structure and systematicity that complicates automated metadata and paradata identification and extraction. The results show a need to broaden the understanding of how structure and systematicity are used and how they impact research data in archaeology and in comparable field sciences. The insights into how a dataset’s KOP and KMP can be read is also a methodological contribution to data literacy research and practice development. On a repository level, the results underline the need to include paradata about dataset creation, purpose, terminology, dataset internal and external relations, and eventual data colloquialisms that require explanation to reusers.
-
The Power to Structure: Making Meaning from Metadata Through Ontologies
Erin Canning, Susan Brown, Sarah Roger and Kimberley Martin
pp. 1–15
AbstractEN:
Information systems are developed by people with intent—they are designed to help creators and users tell specific stories with data. Within information systems, the often invisible structures of metadata profoundly impact the meaning that can be derived from that data. The Linked Infrastructure for Networked Cultural Scholarship project (LINCS) helps humanities researchers tell stories by using linked open data to convert humanities datasets into organized, interconnected, machine-processable resources. LINCS provides context for online cultural materials, interlinks them, and grounds them in sources to improve web resources for research. This article describes how the LINCS team is using the shared standards of linked data and especially ontologies—typically unseen yet powerful—to bring meaning mindfully to metadata through structure. The LINCS metadata—comprised of linked open data about cultural artifacts, people, and processes—and the structures that support it must represent multiple, diverse ways of knowing. It needs to enable various means of incorporating contextual data and of telling stories with nuance and context, situated and supported by data structures that reflect and make space for specificities and complexities. As it addresses specificity in each research dataset, LINCS is simultaneously working to balance interoperability, as achieved through a level of generalization, with contextual and domain-specific requirements. The LINCS team’s approach to ontology adoption and use centers on intersectionality, multiplicity, and difference. The question of what meaning the structures being used will bring to the data is as important as what meaning is introduced as a result of linking data together, and the project has built this premise into its decision-making and implementation processes. To convey an understanding of categories and classification as contextually embedded—culturally produced, intersecting, and discursive—the LINCS team frames them not as fixed but as grounds for investigation and starting points for understanding. Metadata structures are as important as vocabularies for producing such meaning.
-
Trouble in Paradise: Expanding Applications of the Getty Thesaurus of Geographic Names® to Enhance Intellectual Discoverability of Circum-Caribbean Materials
Alexandra Gooding
pp. 1–17
AbstractEN:
This article examines how the Circum-Caribbean region’s cultural and geographic complexities make it difficult to describe or index relevant archival materials using the mainstream authority controls used in galleries, libraries, archives, and museums (GLAMs). This difficulty stems from the fact that authority controls utilised by GLAMs are primarily created by North American or European authorities and, therefore, have Western-centric views imbued with colonialist overtones. When these systems are used to catalogue, index, or describe Circum-Caribbean-related collection materials, a tension arises: a system with a white, Euro-American perspective is applied to material reflective of a significantly multicultural place, culture, subject, and population. The rigidity of controlled vocabularies and their applications—which typically follow specific indexing methodologies—cannot accommodate the fluidity necessary to accurately denote the complex Circum-Caribbean region, especially with regard to geographic indexing. This article demonstrates the difficulties that emerge from trying to delimit and define the Caribbean region; provides an abbreviated analysis of the Circum-Caribbean’s representation in the Getty Thesaurus of Geographic Names® (TGN), which mirrors the difficulties of defining and delimiting the region; and presents a case study in which the West Indian Postcard Collection at Cambridge University Library was indexed using augmented applications of the TGN. The research presented in this paper supports the theory that employing both general and specific indexing strategies creates enhanced access to Caribbean-related collection materials by enabling regional, sub-regional, and territorial/national avenues to retrieve collection materials.
-
Knowledge Graphs, Metadata Practices, and Badiou's Mathematical Ontology
John Huck
pp. 1–17
AbstractEN:
Metadata practices in libraries have been shifting towards a graph-centric data model for a number of years due to the influence of the Semantic Web on metadata standards as well as the ongoing engagement of libraries with linked data. This trend is likely to be sustained by the growth of the knowledge graph domain, which is animated by the interests of large technology companies and which represents a continuation of earlier programmes such as expert systems and the Semantic Web. Given the role of Semantic Web ontologies in knowledge graph development and the relevance of philosophical questions of ontology to cataloguing theory, metadata practitioners require theoretical frameworks suitable for conceptualizing the knowledge graph data model’s mixture of data and ontology. To that end, this paper considers the mathematical ontology of philosopher Alain Badiou, which employs set theory to schematize a theory of the multiple. It outlines how Badiou’s ontology is compatible with the graph data model and what it offers to metadata practitioners seeking to critically engage the knowledge graph paradigm.
-
Using Linked Data Sources to Enhance Catalog Discovery
Huda Khan, Claire DeMarco, Christine Fernsebner Eslao, Steven Folsom, Jason Kovari, Simeon Warner, Tim Worrall and Astrid Usong
pp. 1–26
AbstractEN:
Our research explores how linked data sources and non-library metadata can support open-ended discovery of library resources. We also consider which experimental methods are best suited to the improvement of library catalog systems. We provide an overview of the questions driving our discovery experiments with linked data, a summary of some of our usability findings, as well as our design and implementation approach. In addition, we situate the discussion of our work within the larger framework of library cataloging and curation practices.
-
Leveraging Wikidata to Build Scholarly Profiles as Service
Mairelys Lemus-Rojas, Jere Odell, Lucille Frances Brys and Mirian Ramirez Rojas
pp. 1–14
AbstractEN:
In this article, the authors share the different methods and tools utilized for supporting the Scholarly Profiles as Service (SPaS) model at Indiana University–Purdue University Indianapolis (IUPUI). Leveraging Wikidata to build a scholarly profile service aligns with interests in supporting open knowledge and provides opportunities to address information inequities. The article accounts for the authors' decision to focus first on profiles for women scholars at the university and provides a detailed case study of how these profiles are created. By describing the processes of delivering the service, the authors hope to inspire other academic libraries to work toward establishing stronger open data connections between academic institutions, their scholars, and their scholars' publications.
-
Ethical Considerations of Including Gender Information in Open Knowledge Platforms
Nerissa Lindsey, Greta Kuriger Suiter and Kurt Hanselman
pp. 1–15
AbstractEN:
In recent years, galleries, libraries, archives, and museums (GLAMs) have sought to leverage open knowledge platforms such as Wikidata to highlight or provide more visibility for traditionally marginalized groups and their work, collections, or contributions. Efforts like Art + Feminism, local edit-a-thons, and, more recently, GLAM institution-led projects have promoted open knowledge initiatives to a broader audience of participants. One such open knowledge project, the Program for Cooperative Cataloging (PCC) Wikidata Pilot, has brought together over seventy GLAM organizations to contribute linked open data for individuals associated with their institutions, collections, or archives. However, these projects have brought up ethical concerns around including potentially sensitive personal demographic information, such as gender identity, sexual orientation, race, and ethnicity, in entries in an open knowledge base about living persons. GLAM institutions are thus in a position of balancing open access with ethical cataloging, which should include adhering to the personal preferences of the individuals whose data is being shared. People working in libraries and archives have been increasingly focusing their energies on issues of diversity, equity, and inclusion in their descriptive practices, including remediating legacy data and addressing biased language. Moving this work into a more public sphere and scaling up in volume creates potential risks to the individuals being described. While adding demographic information on living people to open knowledge bases has the potential to enhance, highlight, and celebrate diversity, it could also potentially be used to the detriment of the subjects through surveillance and targeting activities. In this article we seek to investigate the changing role of metadata and open knowledge in addressing, or not addressing, issues of under- and misrepresentation, especially as they pertain to gender identity as described in the sex or gender property in Wikidata. We report findings from a survey investigating how organizations participating in open knowledge projects are addressing ethical concerns around including personal demographic information as part of their projects, including what, if any, policies they have implemented and what implications these activities may have for the living people being described.
-
Semantic Encyclopedias and Boolean Dreams
Alexandra Provo
pp. 1–15
AbstractEN:
When metadata becomes knowledge, opportunities for multiplicity and risks of harm and exclusion arise. As GLAM institutions contribute to the Semantic Web, we must pay attention to the implications of participation. While the Semantic Web grew out of the flourishing of web technologies in the 1990s, recognizing its roots in classical/symbolic AI (referred to as Good Old Fashioned Artificial Intelligence, or GOFAI)—in particular, expert systems and knowledge representation—encourages critical questions like: which problems from knowledge representation and expert systems does the Semantic Web inherit? Are GOFAI failures really failures, or does the gap between rhetoric and practice point to generative possibilities (some of which can now be seen in Semantic Web initiatives)? What can we learn from AI critics, feminist approaches, and the unmasking of encyclopedic neutrality? This research article will explore how critiques of AI expert systems and Cyc, an ongoing project to create a common sense knowledge base, might apply to Semantic Web efforts like Wikipedia, Wikidata, DBpedia, and Schema.org.
Project Reports
-
The South Asian Canadian Digital Archive Thesaurus: Thesaurus Construction for the South Asian Canadian Diaspora
Magnus Berg, Satwinder Kaur Bains and Sadhvi Suri
pp. 1–7
AbstractEN:
The South Asian Canadian Digital Archive (SACDA) is a soon-to-be-released digital repository developed by the South Asian Studies Institute at the University of the Fraser Valley, located in Abbotsford, British Columbia, Canada. SACDA partners with memory institutions, individuals, families, and organizations to digitize, describe, and provide online public access to heritage materials created by, or relevant to, the South Asian Canadian diaspora. This project report will detail how SACDA is building a customized thesaurus to classify its digitized archival holdings, augment existing subject headings and thesauri, and fill in taxonomical gaps. Building on prior work done by alternative thesauri like the Homosaurus, Association for Manitoba Archives Indigenous Subject Headings, Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri, and the International Thesaurus of Refugee Terminology, among others, the SACDA thesaurus intends to fill in a vital gap in South Asian Studies subject control, particularly from a Canadian perspective.
-
Modelling Linked Data for Conservation: A Call for New Standards
Ryan Lieu and Alberto Campagnolo
pp. 1–8
AbstractEN:
Conservation documentation serves an invaluable role in the history of cultural property, and conservators are bound by professional ethics to maintain accurate, clear, and permanent documentation about their work. Though many well-documented schemata exist for describing the holdings of memory organizations, none are designed to capture conservation documentation data in a semantically meaningful way. Conservation data often includes deeply detailed observations about the physical structure, materiality, and condition state of an object and how these characteristics change over time. When included with descriptive catalog metadata, these conservation data points typically manifest in seldom-used fields as free-text notes written with inconsistently applied standards and uncontrolled vocabularies. Beyond the traditional scope of descriptive metadata, conservation treatment documentation includes event-oriented data that captures a sequence of steps taken by the conservator, the addition and removal of material, and cause-and-effect relationships between observed conditions and treatment decisions made by a conservator. In 2020, the Linked Conservation Data Consortium conducted a pilot project to transform unstructured conservation data into linked data. Participants examined potential models in the library field and ultimately chose to conform to the Comité International pour la Documentation (CIDOC) Conceptual Reference Model (CRM) for its accommodation of event-oriented data and detailed descriptive attribution. Project technologists worked with real report data from four institutions to create XML data models and map newly structured data to the CRM. The pilot group then imported CRM-modelled datasets into a discovery environment, developed queries to reconcile the divergent datasets, and created knowledge maps and charts in response to a small set of predetermined research questions. Feedback from conservators attending workshop activities revealed a shared need for conservation data standards and guidelines for those developing documentation templates and databases. Project outcomes signalled the necessity of further developing conservation vocabularies and ontologies to link datasets between institutions and from adjacent domains.
-
The Marmaduke Problem: A Case Study of Comics as Linked Open (Meta)data
Kate Topham, Julian Chambliss, Justin Wigard and Nicole Huff
pp. 1–8
AbstractEN:
Michigan State University (MSU) is home to one of the largest library comics collections in North America, holding over three hundred thousand print comic book titles and artifacts. Inspired by the interdisciplinary opportunity offered by digital humanities practice, a research collaborative linked to the MSU Library Digital Scholarship Lab (DSL) developed a Collections as Data project focused on the Comic Art Collection. This team extracted and cleaned over forty-five thousand MARC records describing comics published in Canada, Mexico, and the United States. The dataset is openly available through a GitLab repository, where the team has shared data visualizations so that scholars and members of the public can explore and interrogate this unique collection. In order to bridge digital humanities with the popular culture legacy of the institution, the MSU comics community turned to bibliographic metadata as a new way to leverage the collection for scholarly analysis. In October 2020, the Department of English Graphic Possibilities Research Workshop gathered a group of scholars, librarians, Wikidatians, and enthusiasts for a virtual Wikidata edit-a-thon. This project report will present this event as a case study to discuss how linked open metadata may be used to create knowledge and how community knowledge can, in turn, enrich metadata. We explore not only how our participants utilized the open-access tool Mix’n’match to connect the Comic Art Collection dataset to Wikidata and increase awareness of lesser-known authors and regional publishers missing from OCLC and Library of Congress databases, but how the knowledge of this community in turn revealed issues of authority control.