Multidimensional appropriate clustering and DBSCAN for SAT solving

Date04 February 2019
Published date04 February 2019
Pages85-107
DOIhttps://doi.org/10.1108/DTA-07-2018-0068
AuthorCelia Hireche,Habiba Drias
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
Multidimensional appropriate
clustering and DBSCAN for
SAT solving
Celia Hireche and Habiba Drias
Department of Computer Science,
University of Sciences and Technology Houari Boumediene, Algiers, Algeria
Abstract
Purpose This paper is an extended version of Hireche and Drias (2018) presented at the WORLD-CIST18
conference. The major contribution, in this work, is defined in two phases. First of all, the use of data mining
technologies and especially the tools of data preprocessing for instances of hard and complex problems prior
to their resolution. The authors focus on clustering the instance aiming at reducing its complexity. The second
phase is to solve the instance using the knowledge acquired in the first step and problem-solving methods.
The paper aims to discuss these issues.
Design/methodology/approach Because different clustering techniques may offer different results for a
data set, a prior knowledge on data helps to determine the adequate type of clustering that should be applied.
The first part of this work deals with a study on data descriptive characteristics in order to better understand
the data. The dispersion and distribution of the variables in the problem instances is especially explored to
determine the most suitable clustering technique to apply.
Findings Several experiments were performed on different kinds of instances and different kinds of data
distribution. The obtained results show the importance and the efficiency of the proposed appropriate
preprocessing approaches prior to problem solving.
Practical implications The proposed approach is developed, in this paper, on the Boolean satisfiability
problem because of its well-recognised importance, with the aim of complexity reduction which allows an
easier resolution of the later problem and particularly an important time saving.
Originality/value State of the art of problem solving describes plenty of algorithms and solvers of hard
problems that are still a challenge because of their complexity. The originality of this work lies on
the investigation of appropriate preprocessing techniques to tackle and overcome this complexity prior to the
resolution which becomes easier with an important time saving.
Keywords Problem solving, Computational complexity, BSO, Data distribution, Data mining techniques,
Satisfiability problem
Paper type Research paper
1. Introduction and motivation
The satisfiability (SAT) problem, being the first problem to be proven NP-complete
(Cook, 1971), is considered as the backbone of the NP-completeness and represents one of
the most known and studied problem, especially for the AI community. It consists in
determining whether a conjunctive normal form (CNF) formula can be set to true or not.
A CNF formula is defined by a conjunction of clauses, each clause being a disjunction of
literals where a literal is defined as a Boolean variable xor its negation x.
In this paper, we propose to cope with the complexity of the SAT problem by applying a
clustering preprocessing for complexity reduction. The clustering consists in dividing a set
of data into groups, according to the similarity between the data. The elements of the same
cluster have to be the most similar as possible contrarily to the elements of different clusters
which have to be the most dissimilar as possible. A large set of clustering techniques exists,
and can be categorised according to the manner of data splitting, the kind of data on which
they work and so other criteria.
On the same data set, these clustering techniques may generate different results
(clusters). It becomes primordial to get certain knowledge about the data prior to the
selection of the clustering method in order to achieve effectiveness. In this work, we will
show how to develop this new and original approach on the variables of SAT instances.
Data Technologies and
Applications
Vol. 53 No. 1, 2019
pp. 85-107
© Emerald PublishingLimited
2514-9288
DOI 10.1108/DTA-07-2018-0068
Received 28 July 2018
Revised 28 November 2018
Accepted 20 December 2018
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2514-9288.htm
85
DBSCAN for
SAT solving
The preprocessing step aims at reducing the complexity of the problem, and hence
guaranties an easier resolution of the problem instance and an important time saving. In
fact, the resulted clusters can be solved independently using either the DavisPutnam
LogemannLoveland (DPLL) algorithm (Davis et al., 1962) or bee swarm optimisation (BSO)
algorithm (Drias et al., 2005) according to the number of variables to be instantiated.
The remainder of this paper is as follows: the next section defines the SAT problem. Section 3
introduces some related works relative to clustering techniques and SAT solvers, followed by a
section dedicated to some data preprocessing before clustering. Section 5 exhibits the different
clustering techniques used according to the data distribution. Section 6 introduces the
second phase of the work and presents the solving algorithms. We enchain with a case study in
Section 7. The conducted experiments and the obtained results are presented and discussed
in Section 8. Finally, some conclusions and perspectives are presented.
2. The satisfiability problem: definition
The Boolean SAT problem is defined by a set of clauses where each clause is a disjunction of
literals. Each literal being a variable xor its negation x, the aim being to find the variables
assignment that makes the problems instance true, i.e. all the clauses have to be true in the
same time. A clause is said to be true if and only if at least one of its literal is true.
SAT problem is defined formally as in Garey and Johnson (1979) by the following pair:
Instance a set of binary variables V¼{x
1
,x
2
,,x
n
} and a set of mclauses C¼{C
1
,
C
2
,,C
m
} formed by the nvariables.
Question: is the CNF formula ϕ¼C
1
^C
2
^^Cm satisfiable? That is, does any
variables assignation allow the formula to be true?
Remark: SAT clauses have to respect two conditions: a clause cannot contain a variable x
i
and its negation xi(tautology); and a clause cannot contain more than one occurrence of the
same variable x
i
.
Example 1: let us consider the set of variables V¼{x
1
,x
2
,x
3
,x
4
,x
5
} and the set of clauses
C¼{C
1
,C
2
,C
3
,C
4
}. An example of SAT instance would be defined as follows:
C1¼x1;x2;x3.
C
2
¼x
2
,x
3
,x
5
.
C
3
¼x
1
,x
4
.
C
4
¼x
1
,x
2
.
One possible variables assignation that would satisfy the SAT instance is {x
1
¼1, x
2
¼1,
x
3
¼1, x
4
¼1, x
5
¼1}.
3. Related works
3.1 Clustering techniques
Clustering (Han et al., 2011) is nowadays one of the widely used data mining tool that consists in
dividing a set of data into clusters considering the similarity between the data. The objects that
have a high similarity are regrouped in the same cluster, while the objects having a high
dissimilarity cannot belong to the same group. Plenty of clustering techniques exist, nowadays.
The partitioning methods (Han et al., 2011) represent the most intuitive methods that
divide the whole data into a determined knumber of clusters. Using an objective function
makes it unsure that the elements of the same cluster are the most similar as possible,
contrarily to the elements of different clusters that are the most dissimilar as possible.
K-means (Lloyd, 1982; Han et al., 2011) is one of the most popular partitioning algorithms,
where the clusters centroid is the mean value of the cluster. Its general principle is to first
86
DTA
53,1

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT