Aslib Journal of Information Management

Emerald Group Publishing Limited
Publication date:

Latest documents

  • Editorial
  • Task design and assignment of full-text generation on mass Chinese historical archives in digital humanities. A crowdsourcing approach

    Purpose: The purpose of this paper is to explore the task design and assignment of full-text generation on mass Chinese historical archives (CHAs) by crowdsourcing, with special attention paid to how to best divide full-text generation tasks into smaller ones assigned to crowdsourced volunteers and to improve the digitization of mass CHAs and the data-oriented processing of the digital humanities. Design/methodology/approach: This paper starts from the complexities of character recognition of mass CHAs, takes Sheng Xuanhuai archives crowdsourcing project of Shanghai Library as a case study, and makes use of the theories of archival science, including diplomatics of Chinese archival documents, and the historical approach of Chinese archival traditions as the theoretical basis and analysis methods. The results are generated through the comprehensive research. Findings: This paper points out that volunteer tasks of full-text generation include transcription, punctuation, proofreading, metadata description, segmentation, and attribute annotation in digital humanities and provides a metadata element set for volunteers to use in creating or revising metadata descriptions and also provides an attribute tag set. The two sets can be used across the humanities to construct overall observations about texts and the archives of which they are a part. Along these lines, this paper presents significant insights for application in outlining the principles, methods, activities, and procedures of crowdsourced full-text generation for mass CHAs. Originality/value: This study is the first to explore and identify the effective design and allocation of tasks for crowdsourced volunteers completing full-text generation on CHAs in digital humanities.

  • Crowdfunding in digital humanities: some evidence from Indonesian social enterprises

    Purpose: This article aims to understand how social enterprises adopt crowdfunding in digital humanities by investigating the mission drifting, risk sharing and human resource practices. Design/methodology/approach: This exploratory study uses a qualitative method by observing five different social ventures in Indonesia. The case study involves observation of social enterprises that concern digital humanities projects and interviews with those who manage the crowdfunding for financing the projects as the key respondents. The analysis uses an interpretative approach by involving the respondents to explain the phenomena. Findings: (1) Adopting the crowdfunding platform encourages social enterprises to reshape social missions with more responsive action for digital humanities. (2) Crowdfunding allows social enterprises to share the risk with stakeholders who focus on fostering the social impact of digital humanities. (3) Crowdfunding stimulates social enterprises to hire professional workers with flexible work arrangements to attract specific donors and investors. Originality/value: The result extends the principles of social enterprises by introducing some concepts of crowdfunding in digital humanities. This study also explains the boundary conditions of digital humanities projects and how crowdfunding can support the projects by adopting the principles of the social enterprise that works on digital humanities projects.

  • A cooperative crowdsourcing framework for knowledge extraction in digital humanities – cases on Tang poetry

    Purpose: The purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH). Design/methodology/approach: The proposed cooperative crowdsourcing framework (CCF) uses both human–computer cooperation and crowdsourcing to achieve high-quality and scalable knowledge extraction. CCF integrates active learning with a novel category-based crowdsourcing mechanism to facilitate domain experts labeling and verifying extracted knowledge. Findings: The case study shows that CCF can effectively and efficiently extract knowledge from multi-sourced heterogeneous data in the field of Tang poetry. Specifically, CCF achieves higher accuracy of knowledge extraction than the state-of-the-art methods, the contribution of feedbacks to the training model can be maximized by the active learning mechanism and the proposed category-based crowdsourcing mechanism can scale up the effective human–computer collaboration by considering the specialization of workers in different categories of tasks. Research limitations/implications: This research proposes CCF to enable high-quality and scalable knowledge extraction in the field of Tang poetry. CCF can be generalized to other fields of DH by introducing domain knowledge and experts. Practical implications: The extracted knowledge is machine-understandable and can support the research of Tang poetry and knowledge-driven intelligent applications in DH. Originality/value: CCF is the first human-in-the-loop knowledge extraction framework that integrates active learning and crowdsourcing mechanisms; he human–computer cooperation method uses the feedback of domain experts through the active learning mechanism; the category-based crowdsourcing mechanism considers the matching of categories of DH data and especially of domain experts.

  • The influences of social value orientation and domain knowledge on crowdsourcing manuscript transcription. An empirical investigation of the Transcribe-Sheng project

    Purpose: The purpose of this paper is to explore how social value orientation and domain knowledge affect cooperation levels and transcription quality in crowdsourced manuscript transcription, and contribute to the recruitment of participants in such projects in practice. Design/methodology/approach: The authors conducted a quasi-experiment using Transcribe-Sheng, which is a well-known crowdsourced manuscript transcription project in China, to investigate the influences of social value orientation and domain knowledge. The experiment lasted one month and involved 60 participants. ANOVA was used to test the research hypotheses. Moreover, inverviews and thematic analyses were conducted to analyze the qualitative data in order to provide additional insights. Findings: The analysis confirmed that in crowdsourced manuscript transcription, social value orientation has a significant effect on participants’ cooperation level and transcription quality; domain knowledge has a significant effect on participants’ transcription quality, but not on their cooperation level. The results also reveal the interactive effect of social value orientation and domain knowledge on cooperation levels and quality of transcription. The analysis of the qualitative data illustrated the influences of social value orientation and domain knowledge on crowdsourced manuscript transcription in detail. Originality/value: Researchers have paid little attention to the impacts of the psychological and cognitive factors on crowdsourced manuscript transcription. This study investigated the effect of social value orientation and the combined effect of social value orientation and domain knowledge in this context. The findings shed light on crowdsourcing transcription initiatives in the cultural heritage domain and can be used to facilitate participant selection in such projects.

  • The topic of terrorism on Yahoo! Answers: questions, answers and users’ anonymity

    Purpose: The purpose of this paper is to explore the use of community question answering sites (CQAs) on the topic of terrorism. Three research questions are investigated: what are the dominant themes reflected in terrorism-related questions? How do answer characteristics vary with question themes? How does users’ anonymity relate to question themes and answer characteristics? Design/methodology/approach: Data include 300 questions that attracted 2,194 answers on the community question answering Yahoo! Answers. Content analysis was employed. Findings: The questions reflected the community’s information needs ranging from the life of extremists to counter-terrorism policies. Answers were laden with negative emotions reflecting hate speech and Islamophobia, making claims that were rarely verifiable. Users who posted sensitive content generally remained anonymous. Practical implications: This paper raises awareness of how CQAs are used to exchange information about sensitive topics such as terrorism. It calls for governments and law enforcement agencies to collaborate with major social media companies to develop a process for cross-platform blacklisting of users and content, as well as identifying those who are vulnerable. Originality/value: Theoretically, it contributes to the academic discourse on terrorism in CQAs by exploring the type of questions asked, and the sort of answers they attract. Methodologically, the paper serves to enrich the literature around terrorism and social media that has hitherto mostly drawn data from Facebook and Twitter.

  • Knowledge management processes, knowledge worker satisfaction, and organizational performance. Symmetric and asymmetrical analysis

    Purpose: Drawing on the knowledge-based view, the purpose of this paper is to investigate the interrelationship between Knowledge Management (KM) processes, Knowledge Worker Satisfaction (KWS) and Organizational Performance (OP). Additionally, the study further seeks to identify the combinations of KM processes and KWS dimensions that can lead to enhanced OP. Design/methodology/approach: Data were collected from 248 academics and administration employees of Higher Education Institutions (HEIs). The relationships were tested using SmartPLS 3.2.7. The study also employed fuzzy set Qualitative Comparative Analysis (fsQCA) for examining configurational paths. Findings: The results of the study revealed that KM processes significantly affect KWS and KWS enhances OP in HEIs. Based on fsQCA, the results revealed multiple configurational paths to improved OP. Originality/value: There is significant lack of research that ascertains the inter-relationship between KM processes, KWS, and OP. This is one of the initial studies that examines the relationship of KM processes, KWS, and OP in HEI’s. From a methodological perspective, the study contributes by combining symmetric and asymmetric statistical tools in KM literature. fsQCA helps to understand the interactions that might not be immediately obvious through traditional symmetric methods.

  • Factors hindering shared files retrieval

    Purpose: Personal information management (PIM) is an activity in which people store information items in order to retrieve them later. The purpose of this paper is to test and quantify the effect of factors related to collection size, file properties and workload on file retrieval success and efficiency. Design/methodology/approach: In the study, 289 participants retrieved 1,557 of their shared files in a naturalistic setting. The study used specially developed software designed to collect shared files’ names and present them as targets for the retrieval task. The dependent variables were retrieval success, retrieval time and misstep/s. Findings: Various factors compromise shared files retrieval including: collection size (large number of files), file properties (multiple versions, size of team sharing the file, time since most recent retrieval and folder depth) and workload (daily e-mails sent and received). The authors discuss theoretical reasons for these negative effects and suggest possible ways to overcome them. Originality/value: Retrieval is the main reason people manage personal information. It is essential for retrieval to be successful and efficient, as information cannot be used unless it can be re-accessed. Prior PIM research has assumed that factors related to collection size, file properties and workload affect file retrieval. However, this is the first study to systematically quantify the negative effects of these factors. As each of these factors is expected to be exacerbated in the future, this study is a necessary first step toward addressing these problems.

  • An empirical analysis of search engines’ response to web search queries associated with the classroom setting

    Purpose: The purpose of this paper is to examine strengths and limitations that search engines (SEs) exhibit when responding to web search queries associated with the grade school curriculum Design/methodology/approach: The authors employed a simulation-based experimental approach to conduct an in-depth empirical examination of SEs and used web search queries that capture information needs in different search scenarios. Findings: Outcomes from this study highlight that child-oriented SEs are more effective than traditional ones when filtering inappropriate resources, but often fail to retrieve educational materials. All SEs examined offered resources at reading levels higher than that of the target audience and often prioritized resources with popular top-level domain (e.g. “.com”). Practical implications: Findings have implications for human intervention, search literacy in schools, and the enhancement of existing SEs. Results shed light on the impact on children’s education that result from introducing misconception about SEs when these tools either retrieve no results or offer irrelevant resources, in response to web search queries pertinent to the grade school curriculum. Originality/value: The authors examined child-oriented and popular SEs retrieval of resources aligning with task objectives and user capabilities–resources that match user reading skills, do not contain hate-speech and sexually-explicit content, are non-opinionated, and are curriculum-relevant. Findings identified limitations of existing SEs (both directly or indirectly supporting young users) and demonstrate the need to improve SE filtering and ranking algorithms.

  • Toward the optimized crowdsourcing strategy for OCR post-correction

    Purpose: Digitization of historical documents is a challenging task in many digital humanities projects. A popular approach for digitization is to scan the documents into images, and then convert images into text using optical character recognition (OCR) algorithms. However, the outcome of OCR processing of historical documents is usually inaccurate and requires post-processing error correction. The purpose of this paper is to investigate how crowdsourcing can be utilized to correct OCR errors in historical text collections, and which crowdsourcing methodology is the most effective in different scenarios and for various research objectives. Design/methodology/approach: A series of experiments with different micro-task’s structures and text lengths were conducted with 753 workers on the Amazon’s Mechanical Turk platform. The workers had to fix OCR errors in a selected historical text. To analyze the results, new accuracy and efficiency measures were devised. Findings: The analysis suggests that in terms of accuracy, the optimal text length is medium (paragraph-size) and the optimal structure of the experiment is two phase with a scanned image. In terms of efficiency, the best results were obtained when using longer text in the single-stage structure with no image. Practical implications: The study provides practical recommendations to researchers on how to build the optimal crowdsourcing task for OCR post-correction. The developed methodology can also be utilized to create golden standard historical texts for automatic OCR post-correction. Originality/value: This is the first attempt to systematically investigate the influence of various factors on crowdsourcing-based OCR post-correction and propose an optimal strategy for this process.

Featured documents

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT