A hybrid approach for predicting missing follower–followee links in social networks using topological features with ensemble learning

DOIhttps://doi.org/10.1108/DTA-02-2022-0072
Published date09 July 2022
Date09 July 2022
Pages131-153
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
AuthorRiju Bhattacharya,Naresh Kumar Nagwani,Sarsij Tripathi
A hybrid approach for predicting
missing followerfollowee links in
social networks using topological
features with ensemble learning
Riju Bhattacharya and Naresh Kumar Nagwani
Department of Computer Science & Engineering, National Institute of Technology
Raipur, Raipur, India, and
Sarsij Tripathi
Department of Computer Science & Engineering, Motilal Nehru National Institute
of Technology, Allahabad, India
Abstract
Purpose Social networking pl atforms are increasi ngly using the Followe r Link Prediction tool i n an
eort to expand the number of their users. It facilitates the discovery of previously unidentied individuals
and can be employed to determine the relationships among the nodes in a social network. On the other
hand, social site rms use followerfollowee link prediction (FFLP) to increase their user base. FFLP can
help identify unfamiliar people and determine node-to-node links in a social network. Choosing the
appropriate person to follow becomes crucial as the number of users increases. A hybrid model
employing the Ensemble Learning algorithm for FFLP (HMELA) is proposed to advise the formation of
new follower links in large networks.
Design/methodology/approach HMELA includesfundamental classicationtechniques for treating link
predictionas a binary classication problem.The data sets are representedusing a variety of machine-learning
-friendly hybridgraph features. The HMELA is evaluatedusing six real-world social network datasets.
Findings The rst set of experiments used exploratory data analysis on a di-graph to produce a balanced
matrix. The second set of experiments compared the benchmark and hybrid features on data sets. This was
followed by using benchmark classiers and ensemble learning methods. The experiments show that the
proposed (HMELA) method predicts missing links better than other methods.
Practical implications A hybrid suggested model for link prediction is proposed in this paper. The
suggested HMELA model makes use of AUC scores to predict new future links. The proposed approach
facilitates comprehension and insight into the domain of link prediction. This work is almost entirely aimed
at academics, practitioners, and those involved in the eld of social networks, etc. Also, the model is quite
eective in the eld of product recommendation and in recommending a new friend and user on social
networks.
Originality/value The outcome on six benchmark data sets revealed that when the HMELA strategy had
been applied to all of the selected data sets, the area under the curve (AUC) scores were greater than when
individual techniques were applied to the same data sets. Using the HMELA technique, the maximum AUC
score in the Facebook data set has been increased by 10.3 per cent from 0.8449 to 0.9479. There has also been
an 8.53 per cent increase in the accuracy of the Net Science, Karate Club and USAir databases. As a result, the
HMELA strategy outperforms every other strategy tested in the study.
Keywords Social network, Link prediction,Eva luation parameters,E nsemblelearning methods, Follo wers,AUC
Paper type Research paper
1. Introduction
Social networks are made up of a collection of social actors or stakeholders and their
linkages. This is usually represented as a graph, where the actors are represented by
nodes and the relations between them are represented by links/edges. A social network
goes through constant changes; one of the most important traits of a social network is its
ability to change. In reality, this could be seen from two distinct viewpoints: the rst one
ThecurrentissueandfulltextarchiveofthisjournalisavailableonEmeraldInsightat:
https://www.emerald.com/insight/2514-9288.htm
131
Received16 February 2022
Revised19May2022
Accepted6 June 2022
Data Technologies and
Applications
Vol. 57 No. 1, 2023
pp. 131-153
© Emerald Publishing Limited
2514-9288
DOI 10.1108/DTA-02-2022-0072
A hybrid
approach for
FFLP
being the growth of new users on the network and the second being the creation of
linkages in the complex network. Complex networks are better representations of real-
world networks, where problems can be seen as having realistic and more complex
patterns in the real world. Social networks are very dynamic in nature, and they
expand and evolve rapidly in time by adding new links between the users, meaning
new connections in the social framework underlying the network. It is a key issue that is
still not understood, and it forms the impetus for our work to identify the process by
whichtheygrow(Liben-Nowell and Kleinberg, 2007). The introduction of new members
to social networking sites increases complex activities (Liu et al., 2019). Predicting the
possibility of an existing association between two nodes is critical in social networks and
data mining (Badis et al., 2018). This problem is also referred to as link prediction.The
purpose of link prediction is to predict the edges that are predicted to form between tand
t+n., i.e. (t<t+n), as shown in Figure 1.
Most of the study of link prediction has been done in the last few years as a result of its use in
several disciplines, namely, complex evolving online social networks (Almgren and Lee, 2016;
Ahuja et al., 2019), making suggestions for peers on social media (Ma et al., 2016;Shabaz and
Garg, 2021), chemioinformatics of three-dimensional chemical molecule structure ecological
systems of species (Nikolentzos et al., 2021), discover hidden relationships in a eld of
security (Kumar et al., 2020;Daud et al.,2020), citation networks (Zhou et al., 2018)andsocial
relationships of users in personalized recommender systems (Ebrahimi and Golpayegani, 2016).
Several approaches have been suggested to solve probable new link prediction in social
network issues. On the basis of local similarity-based techniques, it is assumed that nodes
with shared neighbor structures may develop a link in the future. Since they utilize local
topological knowledge of nodes rather than the topology of the entire network, they are
easier to discover. Most of these tools focus on node degree and surrounding neighborhood
attributes, ignoring connections to other neighborhoods (Nassar et al., 2020;Ahmad et al.,
2020). Also, shared neighbors, AdamicAdar and resource allocation do not include
neighborhood requirements for directed graphs. These measurements do not distinguish
between direct and undirected graphs.
The discovered constraints of local similarity algorithms can be described here:
(1) The observed pair of nodes is unlikely to be a common neighbor (CN) to all steps.
However, in real systems, new connections between nodes without a shared
neighborhood are formed. Therefore, local topological measures cannot explain this
phenomenon.
Figure 1.
Exampleof social
network at time
tand t+n
DTA
57,1
132

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT