Large-scale analysis of query logs to profile users for dataset search

Document

Cited in

DOI	https://doi.org/10.1108/JD-12-2021-0245
Published date	27 April 2022
Date	27 April 2022
Pages	66-85
Subject Matter	Library & information science,Records management & preservation,Document management,Classification & cataloguing,Information behaviour & retrieval,Collection building & management,Scholarly communications/publishing,Information & knowledge management,Information management & governance,Information management,Information & communications technology,Internet
Author	Romina Sharifpour,Mingfang Wu,Xiuzhen Zhang

Large-scale analysis of query logs

to profile users for dataset search

Romina Sharifpour

School of Computing Technologies, RMIT University, Melbourne, Australia

Mingfang Wu

ARDC, Caulfield East, Australia, and

Xiuzhen Zhang

School of Computing Technologies, RMIT University, Melbourne, Australia

Abstract

Purpose –With an explosion of datasets available on the Web, dataset search has gained attention as an

emerging research domain. Understanding users’dataset behaviour is imperative for providing effective data

discovery services. In this paper, the authors present a study on users’dataset search behaviour through the

analysis of search logs from a research data discovery portal.

Design/methodology/approach –Using query and session based features, the authors apply cluster

analysis to discover distinct user profiles with different search behaviours. One particular behavioural

construct of our interest is users’expertise that the authors generate via computing semantic similarity

between users’search queries and the title of metadata records in the displayed search results.

Findings –The findings revealed that there are six distinct classes of user behaviours for dataset search,

namely; Expert Research, Expert Search, Expert Explore, Novice Research, Novice Search and Novice Explore.

Research limitations/implications –The user profiles are derived based on analysis of the search log of

the research data catalogue in this study. Further research is needed to generalise the user profiles to other

dataset search settings. Future research can take on a confirmatory approach to verify these user groups and

establish a deeper understanding of their information needs.

Practical implications –The findings in this paper have implications for designing search systems that

tailor search results matching the diverse information needs of different user groups.

Originality/value –We propose for the first time a taxonomy of users for dataset search based on their

domain expertise and search behaviour.

Keywords Dataset search, Log analysis, Search behaviour, Clustering, Semantic text similarity,

Word embedding

Paper type Research paper

1. Introduction

Recent years have witnessed a phenomenal growth in the amount of data produced, stored

and curated. Increased computational power and the ability to store massive amount of data

at a low cost has led to the emergence and collection of massive number of open datasets

available on the Web. Dataset search is typically achieved through metadata; metadata

provides information about a dataset or a collection of datasets, such as title, description and

creator. Currently, there are thousands of online data repositories available, providing

metadata and access to millions of datasets from governments, research institutions,

scientific publishers as well as data brokers. The more datasets are published, the more

complex the problem of dataset discovery becomes (Brickley et al., 2019).

Understanding the user behaviour is known to be central to the improvements of data

discovery services (Arapakis et al., 2014) and for the same reason extensive research have

investigated users’informat ion seeking behaviour. Previous r esearch attempting to

79,1

The authors thank the Australian Research Data Commons for making their search log dataset available

for the study; special thanks to Mr. Joel Benn, for extracting and helping clean up the search log dataset.

The current issue and full text archive of this journal is available on Emerald Insight at:

https://www.emerald.com/insight/0022-0418.htm

Received 21 December 2021

Revised 9 March 2022

28 March 2022

Accepted 29 March 2022

Journal of Documentation

Vol. 79 No. 1, 2023

pp. 66-85

0022-0418

DOI 10.1108/JD-12-2021-0245

understand users’search behaviour have primarily focused on searching for documents or

information within a Web search setting (Bhavnani, 2002;Jansen and Spink, 2006) or Digital

Libraries (Gross and Taylor, 2005) and searching for products on E-commerce websites

(Sondhi et al., 2018). There is a general consensus among the past research that various user’s

characteristics such as their search expertise, domain knowledge of the search topics and

cognitive factors influence the way users preform search, formulate queries and assess their

search outcomes (White et al., 2009;Jansen et al., 2009;Wildemuth, 2004;Kathuria et al., 2010).

Past research also makes a distinction between domain expertise and search expertise.

While domain knowledge is related to searcher’s knowledge of the search topic, search

expertise relates to the knowledge of the search process and the ability to construct a query

that results in high precision of a search results (Wildemuth, 2004;H€

olscher and Strube, 2000).

Searcher’s domain knowledge and Web search expertise both are known to impact the

process of search strategy as well as search success. Among the research concerning the

domain knowledge, studies by Bhavnani (2002) and White and Drucker (2007) revealed that

domain experts often started their research on websites containing key resources, rather than

utilising the general web search engines. Others suggested that domain knowledge affects

individuals ability to choose a more diverse and suitable search terms (Vakkari, 2002;White

and Drucker, 2007;White et al., 2009). Among those concerning the web expertise, the results

indicated differences between search expert and novices in terms of their search process, with

experts being characterised with specific query formulation and search strategies (White

et al., 2009;White and Drucker, 2007).

Domain knowledge and search expertise are also known to have an effect on each other.

Past research suggested that domain knowledge influences the users’search tactics such as

adding or deleting concepts to the search query (Wildemuth, 2004), spending more time

preparing search queries as well as devising search queries that contain more specific

vocabulary from the domain-specific lexicon (Hsieh-Yee, 1993).

Majorityof existingresearch in the literature focuseson user behaviourfor searchingtextual

documents/ web pages, images or videos. Limited research but growing interest exists in the

research community to uncover user dataset search behaviour, in light of the vast amount of

datasets that is becoming available on the Web due to the Open Data initiatives (Carevic et al.,

2020). Similarly, we have seen growing interests in designing information retrieval models

specifically for dataset search (Chapman et al., 2020;Brickley et al., 2019). Furthermore, this body

of emerging research suggests distinctions between the search for datasets and information.

Dataset search is identified to be more challenging in comparison to classical information search.

Thisis mainly due tothe diverse anddistinctivenature of datasetsearch thatembeds both users’

complex information need as well as query formulation (Carevic et al.,2020). Additionally,

dataset search involves more difficult selection decision compared to search for information

(Kern and Mathiak, 2015). This is partially due to the reason that the data for relevance

judgement is not readily accessible within the metadata of datasets, making it difficult for users

to understand the structure of datasets. Consequently, research to fully understand users’

behaviours seeking for datasets is imperative to the successful establishment of effective

metadata or retrieval models that can satisfy the complex in formation needs for data search.

To date, a few studies exist that have attempted to understand the dataset users’

behaviour and intentions (Kacprzak et al., 2019;Carevic et al., 2020;Chen et al., 2019). These

studies however remain limited in explaining various aspects of the users’behaviour.

Most of these research in particular fail to consider individuals differences, such as level

of domain knowledge and web search expertise as well as how formulating the query and

search behaviour can vary in light of such differences. On the other hand, the majority of the

research that characterises users’behaviour, heavily relies on in-depth interviews,

quantitative surveys or lab experiments (Gregory et al., 201 9;Jansen et al.,2009;

Koesten et al., 2017). While these methods are very valuable in uncovering users’

Large-scale

analysis of

query logs

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Large-scale analysis of query logs to profile users for dataset search

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users