International Journal of Computational Intelligence Systems

Volume 12, Issue 2, 2019, Pages 1245 - 1254

A Contradiction Separation Dynamic Deduction Algorithm Based on Optimized Proof Search

Authors
Feng Cao1, 2, *, Yang Xu2, 3, Shuwei Chen2, 3, *, Jian Zhong1, 2, Guanfeng Wu2, 3
1School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, Sichuan, P.R. China
2National-Local Joint Engineering Laboratory of System Credibility Automatic Verification, Southwest Jiaotong University, Chengdu 610031, Sichuan, P.R. China
3School of Mathematics, Southwest Jiaotong University, Chengdu 610031, Sichuan, P.R. China
*Corresponding authors. Email: caofeng@my.swjtu.edu.cn; swchen@swjtu.edu.cn
Corresponding Authors
Feng Cao, Shuwei Chen
Received 19 August 2019, Accepted 15 October 2019, Available Online 12 November 2019.
DOI
10.2991/ijcis.d.191022.002How to use a DOI?
Keywords
Automated theorem proving; Binary resolution; Multi-clause inference rule; S-CS rule; Heuristic strategy
Abstract

Most of the advanced first-order logic automated theorem proving (ATP) systems adopt binary resolution methods as the core inference mechanism, where only two clauses are involved and a complementary pair of literals are eliminated during each deduction step. Recently, a novel multi-clause inference rule is introduced along with its soundness and completeness, which is called as standard contradiction separation rule (in short, S-CS rule) and allows multiple (two or more) clauses to be involved in each deduction step. This paper introduces and evaluates the application of S-CS rule in first-order logic ATP. Firstly, it analyzes several deduction methods of S-CS rule. It is then focused on how this multi-clause deduction theory can be achieved through forming a specific and effective algorithm, and finally how it can be applied in the top ATP systems in order to improve their performances. Concretely, two novel multi-clause S-CS dynamic deduction algorithms are proposed based on optimized proof search, including related heuristic strategy, then the application method applied in the state of the art ATP system Eprover (the version of Eprover 2.3) is introduced. Eprover with the proposed multi-clause deduction algorithms are evaluated through the FOF division of the CASC-J9 (in 2018) ATP system competition. Experimental results show that Eprover with the proposed multi-clause deduction algorithms outperform the plain Eprover itself to a certain extent.

Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION

Automated reasoning is an important research branch in the field of artificial intelligence, and it has been successfully applied in many areas [1], e.g., mathematics [2], formal verification [3]. Compared with program testing, formal verification is an important research work in the field of current information security, and automated reasoning plays an important role in formal verification. The automated theorem proving (ATP) in first-order logic is an important research work in the field of automated reasoning. ATP systems mainly adopt saturation [4,5], inference rule [612], superposition calculus [13] and heuristic strategies [1418], and have strong capability after several decades of development, e.g., Eprover [19], Vampire [20].

Saturation is a deduction algorithm for most famous first-order logic ATP systems [1921]. It is represented by given-clause algorithm, that can be divided into discount [4] and otter [5]. The basic idea is that the algorithm divides the whole clause set into passive and active. In each deduction step, a clause (called as current clause) is selected from passive by heuristic strategy and put into active. The inference rules are applied only in active, the generated new clauses are finally put into passive. Repeat this operation until an empty clause is generated or timeout. The given-clause algorithm can effectively plan proof search for theorem proving and is easy to implement. On the other hand, since the generated clauses are put into passive in each deduction step, the number of clauses in passive increases rapidly. As a result, it makes the effective selection of an optimal current clause in such a large number of clauses very difficult. In other words, current ATP systems have a great dependence on clause selection that is experimental to some extent. It is therefore of great significance to research new inference methods that can reduce the dependence on heuristic strategy.

As a major breakthrough in the field of automated reasoning, the resolution principle [6] and its refinements have become the main inference methods for current ATP systems. Assumed C1, C2 are two clauses in first-order logic without common variable, where C1=P1P2Pm, C2=Q1Q2Qn. If two literals of P1 and Q1 exist a unified complementary and the most general unifier σ, then RC1,C2=C1σP1σC2σQ1σ is called binary resolvent of C1 and C2. Binary resolution [6] is the core of resolution principle, and other resolution methods [2225] presented later are almost developed based on binary resolution. The state of the art resolution-based ATP systems mainly adopt binary resolution [6], unit-resulting resolution [9] (UR resolution), hyper-resolution [10], equality resolution [26], and conflict resolution [12]. From the literature review, we noticed that these resolution rules only have 2 clauses involved and eliminated only one complementary pair of literals in each deduction step, so the essence of these inference rules are still binary resolution. Binary resolution is a static method, only one complementary pair of literals eliminated in each deduction step, so the binary resolvent often has many literals, which will affect deduction efficiency. Due to the restriction of binary resolution, it is of great significance to research whether it can be improved, such as multi-clause resolution method.

Aiming at handling multiple input clauses synergizely in each deduction step, the S-CS (standard contradiction separation) rule [27] was proposed as a multi-clause dynamic deduction method, which can fully exert the synergized effect between more than two input clauses. It can eliminate a literal set from the input clauses by separating a standard contradiction, and the generated clauses usually have few literals by dynamic control. In addition, the constructed standard contradiction has a guidance effect on the selection of subsequent input clauses, which reduces the dependence on clause selection by heuristic strategy. It is an important development of the traditional binary resolution method and the detailed introduction of S-CS rule in the proposition logic and the first-order logic can be found in the published paper [27]. On the other hand, the specific deduction algorithms, strategies to support this theory, and its application have not yet been covered. How to design an effective S-CS deduction algorithm and apply it effectively to ATP is the main motivation of the present work, which is necessary and important.

In this paper, we focus on designing and applying an effective multi-clause dynamic deduction algorithm according to S-CS rule. By analyzing the characteristics of S-CS dynamic deduction, and in order to make the generated clauses contain few literals, two S-CS dynamic deduction algorithms based on optimized proof search are proposed. This proposed multi-clause dynamic deduction algorithms are applied to the state of the art ATP system Eprover, and the experimental results evaluation are introduced.

The remainder of the paper is structured as follows. In Section 2, the S-CS rule is reviewed and its deduction characteristics are analyzed in detail. In Section 3, the design of two multi-clause dynamic deduction algorithms according to S-CS rule are proposed. The application of the proposed algorithms in the state of the art ATP system is introduced in Section 4. Section 5 shows the performance of the improved ATP systems and gives detailed analysis. In last Section, the conclusions and future work are provided.

2. S-CS RULE

This section firstly provides a review of basic concepts of S-CS rule, then gives a detailed analysis of its deduction characteristics.

Definition 2.1.

[27] Suppose a clause set S=C1,C2,,Cm in first-order logic, where the following conditions hold:

  1. There does not exist the same variables among C1,C2,,Cm (if there exist the same variables, a rename substitution can be applied to make them different);

  2. For each Cii=1,2,,m, a substitution σi can be applied to Ci (σi could be an empty substitution) and the same literals merged after substitution, denoted as Ciσi; Ciσi can be partitioned into two sub-clauses Ciσi and Ciσi+ such that:

    1. Ciσi=CiσiCiσi+, where Ciσi and Ciσi+ do not share the same literal, Ciσi+ can be an empty clause, Ciσi cannot be an empty clause; moreover,

    2. For any x1,,xmi=1mCiσi, there exists at least one complementary pair among x1,,xm, i=1mCiσi is called a separated standard contradiction (S-SC);

    3. The resulting clause i=1mCiσi+, denoted as msσC1,,Cm (here “s” means “standard,” σ=i=1mσi), is called an S-CS clause of C1,C2,,Cm.

The inference rule that produces a new clause msσC1,,Cm is called an S-CS rule in first-order logic.

Definition 2.2.

[27] Suppose a clause set S=C1,C2,,Cm in first-order logic. Φ1,Φ2,,Φt is called an S-CS based dynamic deduction sequence from S to a clause Φt, if

  1. ΦiS; or

  2. there exist r1,r2,,rki<i, Φi=kisθiΦr1,Φr2,,Φrki.

Where θi=j=1kiσj, σj is a substitution to Φrj, j=1,2,3,,ki,i1,2,,t.

S-CS rule takes multiple (two or more) clauses as input, selects a subset of the literals from each input clause to build a standard contradiction, and infers the disjunction of the non-selected literals of the input clauses. In contrast to the chained inference rules, such as UR resolution, hyper-resolution, linear resolution, and conflict resolution, they can also be expressed in the form of handling multiple input clauses and generated a new clause. However, the essential difference is that only two clauses are involved and only a complementary pair of literals are eliminated in each deduction step for the chained inference rules, with regardless of the synergized effect between more than two clauses. S-CS rule handles multiple (two or more) input clauses by separating a standard contradiction from the input clauses in each deduction step. The standard contradiction is not simply multiple complementary pairs of literals, but a literal set as long as there are a complementary pair of literals. Therefore, in contrast to binary resolution method, S-CS rule has a stronger literals elimination ability in each deduction step.

Example 2.1.

Let S=C1,C2,C3,C4 be a clause set in first-order logic, where C1=P1aP2b,fb, C2=P2a,bP2b,fb, C3=~P1aP1b~P2x1,x2P3x1,x3, C4=P1b~P1x1~P2x2,x3~P3a,fb. Here a, b are constants, f is a function symbol, x1,x2,x3 are variables.

Applying S-CS rule to the 4 input clause C1,C2,C3,C4, we obtain an S-CS clause involving 4 clauses: C5=P1bP2b,fb, the S-SC is P1aP2a,b~P1a~P2a,bP3a,fb ~P1a~P2a,b~P3a,fb.

In the process of S-CS deduction for C1,C2,C3,C4 in sequence, when C3 as an input clause is involved, C3 is eliminated 2 literals by two complementary pairs of literals ~P1a,P1a, ~P2a/x1,b/x2,P2a,b, when C4 as an input clause is involved, C3 is eliminated 3 literals by three complementary pairs of literals ~P1a,P1a,~P2a/x1,b/x2,P2a,b, P3a/x1,fb/x3,~P3a,fb, it is quite different from binary resolution method in each deduction step. Therefore, S-CS dynamic deduction can achieve a greater degree of literals elimination than binary resolution method, and thus the S-CS clauses usually have fewer literals. As shown in Example 2.1, the four input clauses have 12 literals, and 8 literals are eliminated by separating a standard contradiction, generates an S-CS clause only with 2 literals after the same literals are merged.

According to the definition of S-CS rule and the illustration of Example 2.1, the S-CS deduction has multi-clause, dynamic, synergized, controllable, guided characteristics [28], so it is a challenge to design an effective S-CS dynamic deduction algorithm. For theorem proving by S-CS rule, the first input clause and the subsequent input clauses are optionally selected, and even the same input clause can be repeatedly participated in the process of S-CS deduction. The process of input clauses participating in S-CS deduction is dynamic, and the deduction can be halted through the number of input clauses or the measures of the S-CS clause generated in each deduction step. Because the S-CS dynamic deduction is performed by continuously selecting input clauses and generating S-CS clauses by separating different standard contradictions, the constructed standard contradiction has synergized deduction effect, and has a guidance effect on the selection of subsequent input clauses. Therefore, the dependency of clause selection strategy is reduced. At the same time, the clause selection of the S-CS deduction is controllable, that is, when an input clause participates in S-CS deduction and generates an S-CS clause, it can be evaluated whether the input clause satisfies the user-defined deduction requirements by the measures of the S-CS clause. When the requirements are not satisfied, the input clause can be reselected until the deduction meets the requirements. In addition, a selected literal from the input clause used for constructing a standard contradiction is also controllable, that is because different construction methods based on the literals from input clauses can separate different standard contradictions.

Example 2.2.

Let S=C1,C2,C3,C4,C5,C6 be a clause set in first-order logic, where C1=P1a1,a2, C2=~P1x22,x21P1x21,x22, C3=~P2x31,a2~P2x31,a1, C4=~P2x43,x41P1x41,x42~P2x43,x42, C5=~P1x51,x52P2f1x51,x52,x52, C6=~P1x61,x62P2f1x61,x62,x61. Here a1, a2 are constants, f1 is a function symbol, x21,x22,x31,x41,x42, x43,x51,x52,x61,x62 are variables.

In the process of S-CS dynamic deduction, the input clause allows for the absence of unification with the literals in the constructed standard contradiction, so the efficiency of S-CS deduction with the irregular selection of input clauses is very low. Generally, for the resolution calculus, clauses with a smaller number of literals usually have higher deduction efficiency, because their resolvent often contains fewer literals. Follow this principle and according to the characteristics of S-CS deduction, we use S-CS rule for C1,C2,C5,C6,C3, and obtain the S-CS clause:

C7=7sσ7C1,C2,C5,C6,C3=, where the corresponding S-SC is: P1a1,a2~P1a1/x22,a2/x21P1a2/x21,a1/x22 ~P1a1/x51,a2/x52P2f1a1/x51,a2/x52,a2/x52 ~P1a1/x61,a2/x62P2f1a1/x61,a2/x62,a1/x61 ~P2f1a1,a2/x31,a2~P2f1a1,a2/x31,a1.

Since the proof search for first-order logic theorem is very complicated and varied, it is difficult to design an S-CS dynamic deduction algorithm, which can quickly find refutation for all theorems proving. This paper starts from the efficiency of the S-CS deduction, and according to the dynamic, controllable characteristics of S-CS deduction, it preferentially generates S-CS clauses which contain a small number of literals, that is, the S-CS dynamic deduction algorithm proposed in this paper is to make the generation of S-CS clause a slowly increasing process in terms of the number of literals, which optimizes proof search in this way, such as the above illustration of Example 2.2. We call this algorithm for optimizing proof search but not reusing clause (the same input clause) as S-CS Dynamic Deduction Algorithm Based on Optimized Proof Search but not reusing clause, in short, SCSAN. Since S-CS rule allows repeatedly use input clauses to participate in the deduction, and take care of the literal from an input clause used for separating a standard contradiction may have multiple ways, which can perform a more in-depth proof search during the process of S-CS deduction. We call this algorithm for optimizing proof search but reusing clause as S-CS Dynamic Deduction Algorithm Based on Optimized Proof Search but reusing clause, in short, SCSA. We use S-CS rule for C1,C2,C5,C5,C6,C6,C3, and obtain the S-CS clause:

C8=8sσ8C1,C2,C5,C5,C6,C6,C3=, where the corresponding S-SC is: P1a1,a2~P1a1/x22,a2/x21P1a2/x21,a1/x22 ~P1a1/x51,a2/x52P2f1a1/x51,a2/x52,a2/x52 ~P1a2/x51,a1/x52P2f1a2/x51,a1/x52,a1/x52 ~P1a1/x61,a2/x62P2f1a1/x61,a2/x62,a1/x61 ~P1a2/x61,a1/x62P2f1a2/x61,a1/x62,a1/x62 ~P2f1a1,a2/x31,a2~P2f1a1,a2/x31,a1.

The design of the two algorithms are described in detail in this paper, and the comparative analysis of experimental results are given.

3. DESIGN OF S-CS DYNAMIC DEDUCTION ALGORITHM BASED ON OPTIMIZED PROOF SEARCH

The S-CS dynamic deduction selects input clauses in clause set, selects literals of the input clauses to construct standard contradictions, and generates S-CS clauses. According to the evaluation of the generated S-CS clause in each deduction step, it is decided whether the measures of the S-CS clause satisfies the user-defined requirement, if the requirements are not satisfied, it need to go back to the last deduction step and reselect a new input clause to participate in the deduction. Repeat this operation and generate S-CS clauses until the exit conditions are met. Therefore, before designing the S-CS dynamic deduction algorithm, it is necessary to determine how to select the input clause, how to select a literal in the input clause to construct a standard contradiction, how to evaluate a deduction step based on the S-CS clause, how to perform the retreat of the deduction step if current step does not satisfy the user-defined requirements, and how to exit S-CS deduction.

3.1. Clause Selection Strategy

Clause selection strategy is used to select an input clause for S-CS dynamic deduction, it is implemented mainly based on the following clause attributes:

  1. Acceptable deduction weight. Acceptable deduction weight of an input clause is the number of times that the clause has participated in an acceptable S-CS dynamic deduction. The acceptable deduction means the deduction generates an S-CS clause that meets the user-defined requirements (e.g., maximum number of literals, maximum term depth).

  2. Unacceptable deduction weight. Unacceptable deduction weight of an input clause is the number of times that the clause has participated in an unacceptable S-CS dynamic deduction. An unacceptable deduction means the deduction generates an S-CS clause that does not meet the user-defined requirements, and it need to go back to the last deduction step.

  3. The number of literals. The number of literals is a most important attribute of an input clause. When an input clause with few literals participates in S-CS dynamic deduction, it is likely that the generated S-CS clause also has few literals by separating a standard contradiction from the input clauses.

  4. Clause weight. Clause weight is used to characterize the statistics of symbols in an input clause. Set the initial weight of each symbol (including constant, variable, function) equal to 1, clause weight is equal to the sum of each symbol weight in the input clause.

  5. Function symbol weight. S-CS rule is a multi-clause deduction method, which can handle multiple input clauses (two or more) in each deduction step. Furthermore, one or more literals in an input clause can be used for constructing a standard contradiction, which causes the variable to be substituted for unification multiple times, thereby greatly increasing the generated S-CS clause weight, wherein the main reason is the variable substitution in a function symbol. Function symbol weight is used to characterize the statistics of symbols in a function symbol. Set the initial weight of each symbol (including constant, variable, function) equal to 1, function symbol weight is equal to the sum of each symbol weight in the function symbol.

Clause selection strategy selects an input clause to participate in S-CS dynamic deduction based on the priority order with multiple attributes of the clause. For example, the priority order is: The number of literals > Acceptable deduction weight > Unacceptable deduction weight. That means the clause with the smallest number of literals is preferentially selected. When the clauses have the same number of literals, then the clause with the smallest Acceptable deduction weight is preferentially selected. When the clauses have the same number of literals and the same Acceptable deduction weight, then the clause with the smallest Unacceptable deduction weight is preferentially selected. When the clauses have all the same of the three attributes, the clause is selected by the storage order.

3.2. Literal Selection Strategy

Literal selection strategy is used to select a literal in the input clause to construct a standard contradiction. The selected literal is different, and the constructed standard contradiction is also different. It is implemented mainly based on the following literal attributes:

  1. Acceptable deduction weight. Acceptable deduction weight of a literal is the number of times that the literal used for constructing standard contradictions has participated in an acceptable S-CS dynamic deduction.

  2. Unacceptable deduction weight. Unacceptable deduction weight of a literal is the number of times that the literal used for constructing standard contradictions has participated in an unacceptable S-CS dynamic deduction.

  3. The number of complementary predicates. In the process of S-CS multi-clause deduction, it is used to evaluate the potential ability of a literal used for separating standard contradictions. That is, if the predicate in a literal has a large number of complementary predicates in clause set, there will be many input clauses can be selected to participate in S-CS dynamic deduction.

  4. The number of variables. In the process of S-CS multi-clause deduction, the literal in an input clause used for constructing standard contradictions can be reused, therefore, if a literal contains more variables, the stronger ability of separating standard contradictions has.

  5. Literal type. If the same variable comes from different literals in a clause, then the variable is called a shared variable. The literal which contains shared variable is called a shared literal. Otherwise, the literal is called an independent literal. The literals are marked as independent or shared. In the process of S-CS dynamic deduction, an independent literal is used for separating standard contradictions, it will not affect the left literals form the same input clause which has added to S-CS clause.

Literal selection strategy selects a literal to separate standard contradictions based on the priority order with multiple attributes of the literal, the priority order is the same as that of clause selection strategy.

3.3. S-CS Deduction Step Evaluation Strategy

S-CS deduction step evaluation strategy is used to evaluate each S-CS deduction step, whether it is acceptable or unacceptable according to the attributes of the generated S-CS clause. This evaluation method is based on the assumption that the attribute of an S-CS clause exceeds the set requirement, it will not contribute to find refutation. If the attribute of an S-CS clause does not satisfy the set requirement, it needs to go back to the last deduction step, and reselects a new input clause to participate in the deduction. There are two attributes of the S-CS clause which is used to evaluate each deduction step.

  1. The maximum number of literals. It is used to control the maximum number of literals in S-CS clause. When the number of literals exceeds the set threshold, the S-CS dynamic deduction will reselect a new input clause to participate in the deduction, until the number of literals in the generated S-CS clause meets the threshold requirement.

  2. Maximum term depth. It is used to control the maximum term depth in S-CS clause. When the maximum term depth exceeds the set threshold, the S-CS dynamic deduction will reselect a new input clause to participate in the deduction, until the maximum term depth in the generated S-CS clause meets the threshold requirement.

3.4. S-CS Dynamic Deduction Algorithm Based on Optimized Proof Search

The S-CS deduction is dynamic and controllable, it is a process of multi-clause deduction which is combined with a lot of deduction steps. In each deduction step, the clause selection strategy and the literal selection strategy are used to select an input clause and select which literal from the input clause to construct a standard contradiction, and evaluate the deduction step whether it is acceptable or unacceptable by the attributes of the generated S-CS clause. In order to effectively optimize the proof search, an S-CS dynamic deduction algorithm for making the generation of S-CS clause a slowly increasing process in terms of the number of literals is proposed (SCSAN). At the same time, based on optimized proof search, taking care of the literal in an input clauses exists many ways to construct different standard contradictions, an S-CS dynamic deduction algorithm of the repeated use of the input clauses is also proposed (SCSA).

The pseudo-code for the algorithm is described in Figure 1 below. The corresponding algorithm is described in detail as follows:

Figure 1

The pseudo-code for the standard contradiction separation (S-CS) dynamic deduction algorithm.

Step 1: According to clause selection strategy, select an input clause in clause set (marked as initial input clause C0, which is the first selected clause), and a literal from the input clause C0 is selected according to literal selection strategy, and the literal is prepared for constructing a standard contradiction.

Step 2: Select next input clause in clause set according to clause selection strategy, and select a literal from the selected input clause to construct a standard contradiction. If all clauses in clause set have been searched or selected, go to Step 4.

Step 3: This step is used to search optimal an input clause. Apply S-CS rule with the selected clauses, separate a standard contradiction, and generate an S-CS clause, called as R. If R is an empty clause, go to Step 7. The number of literals in the generated S-CS clause is calculated (marked as K). If K value calculated for the first time, then it is temporarily recognized as the minimum value and the current input clause is recorded as optimal clause. Otherwise K is decided whether it is a minimum value (the minimum value was updated during the search process). If K is less than the minimum value, record the current input clause as optimal clause. Perform search backtracking. The input clause search backtracking algorithm (see the description below) which is used to go back to the last search step, go to Step 2.

Step 4: An optimal clause Cii=1,2,,n is search out which can make the generated S-CS clause has the minimum number of literals in the process of S-CS dynamic deduction. Apply S-CS rule for Ci and the selected input clauses, separate a standard contradiction and generate an S-CS clause, called as Rjj=1,2,,n. Rj is checked whether this deduction step is acceptable.

(1) Rj is a tautology. (2) Rj is a redundant clause. (3) The maximum term depth exceeds the set threshold.

If one of the above three conditions is satisfied, the S-CS dynamic deduction backtracking algorithm (see the description below) is performed, and go to Step 2.

The input clause search backtracking algorithm: (1) Clear the substitutions and the deduction identifier of the input clauses, which produced by the current input clause participated in deduction. (2) Remove the literals in the constructed standard contradiction and the S-CS clause, which come from the current input clause. (3) Mark the input clause that have been searched.

The S-CS dynamic deduction backtracking algorithm: (1) Clear the substitutions and the deduction identifier of the input clauses, which produced by clause Ci participated in deduction. (2) Remove the literals in the constructed standard contradiction and the S-CS clause, which come from clause Ci. (3) Unacceptable deduction weight of clause Ci is increased by 1, unacceptable deduction weight of the literals in clause Ci which are used for constructing a standard contradiction are increased by 1. (4) Clear the tags of the clauses which have been searched. (5) Mark the input clause that caused an unacceptable deduction.

Step 5: It is an acceptable S-CS deduction, the acceptable deduction weight of clause Ci is increased by 1, the acceptable deduction weight of literals in clause Ci which are used for constructing standard contradictions are increased by 1, and mark the deduction identifier of clause Ci. Clear the tags of the clauses which have been searched. The S-CS clause is added to the clause set S1 (initially empty) which is used to store the generated S-CS clause in each deduction step. Check the exit conditions (see the description below) for the S-CS dynamic deduction, if satisfied, go to Step 7. Check whether clause Ci contains substitution, if it contains and the S-CS deduction allows clause Ci to be reused, go to Step 6 (only for SCSA algorithm). Otherwise, go to Step 2.

Step 6: This is an optional step for reusing the same input clause in the process of S-CS dynamic deduction (only for SCSA algorithm). In order to avoid repeated proof search, record the proof search which caused by clause Ci. Clear substitutions of clause Ci and reconstruct clause Ci, go to Step 4.

Step 7: Satisfy the exit conditions, exit this S-CS dynamic deduction. The substitutions of all the input clauses are cleared.

The exit conditions are described as following:

(1) The S-CS clause is an empty clause. (2) The total number of input clauses reaches the set threshold. (3) The number of literals in S-CS clause reaches the set threshold. (4) The maximum term depth in S-CS clause reaches the set threshold. (5) The runtime reaches the set threshold. (6) The remaining memory reaches the set threshold.

The involved thresholds in the above S-CS dynamic deduction algorithm are set by parameters in the heuristic strategies. There are several special cases of S-CS dynamic deduction need to be further processed, they are described as following:

(1) If the clause C0 fails to perform an acceptable S-CS dynamic deduction, it is added into a special clause list in which has the lowest selection priority in the subsequent S-CS deduction. (2) If all the clauses in clause set as the first input clause cannot perform an acceptable S-CS dynamic deduction, automatically adjust the set thresholds of S-CS clause (number of literals, maximum term depth). (3) Internal simplification is performed on the S-CS clause set S1. (4) Apply backward simplification on the whole clause set using the clauses in S1, then each S-CS clause in clause set S1 is added in whole clause set.

The functions in the above pseudo-code are remarked as follows:

selectFirstInputClause(S): select the first input clause for S-CS dynamic deduction according to clause selection strategy, and mark the selected clause as an initial input clause.

selectNextInputClause(S): select an input clause for S-CS dynamic deduction according to clause selection strategy, the difference between selectFirstInputClause and selectNextInputClause is that the input clause selected by selectNextInputClause is to be used to construct a standard contradiction, but the input clause selected selectFirstInputClause has no restrictions.

extendedSplitContradiction(d, D): apply S-CS rule with the selected input clause d and the input clause set D, construct a standard contradiction and generate an S-CS clause.

findPriorInputClause(q, m): compared to m (m is the minimum number of literals in an S-CS clause which is generated in the previous search steps, the corresponding input clause is assumed to be h), if the number of literals in S-CS clause q is less than m, return current selected input clause. Otherwise, return clause h.

traverseComplete(S): Clause set S has been traversed, it is used to exit the search for the optimal input clause.

searchBacktracking(): It is The input clause search backtracking algorithm, which is used to go back to the last search step.

checkUnacceptable(q, S): check whether the S-CS clause q in clause set S is an unacceptable clause (lead to an unacceptable S-CS deduction), the conditions of an unacceptable clause are described in Step 4 above.

deductionBacktracking(w): It is The S-CS dynamic deduction backtracking algorithm, which is used to return to the last search step.

checkExitConditions(): check the exit conditions (described in Step 7) of S-CS dynamic deduction.

permitReuseClause(): Check whether the deduction allows the input clauses to be reused.

checkClause(w): check whether clause w contains substitution.

reconstructClause(w): reconstruct a new clause which does not contain substitution according to clause w.

inner_Simplification(S1): apply forward simplification, factoring rule, and equality resolution in S-CS clause set S1.

backward_Simplification(S, c): check whether the clauses in clause set S are redundant clauses using the clause c. If yes, delete the redundant clauses.

4. APPLICATION OF THE PROPOSED S-CS DYNAMIC DEDUCTION ALGORITHMS

In order to effectively evaluate the proposed two S-CS dynamic deduction algorithms based on optimized proof search, we apply the two multi-clause algorithms to the state of the art ATP system Eprover [19]. Eprover is a well-known equational automated theorem prover for first-order logic with powerful equality processing capability based on superposition [13] and rewriting [29,30], and has won the second place in the Conference on Automated Deduction (CADE) ATP System Competition (CASC) for many years since 2002 [3133], but its inference core still adopts the binary resolution methods. Eprover is also an open source ATP system in first-order logic, we choose the version E 2.3 [34] to combined with the proposed S-CS dynamic deduction algorithms (a combined ATP system), and evaluate its effectiveness. The running mode of the combined ATP system is: it runs the plain Eprover first at a certain time, then if Eprover does not find a proof, the proposed algorithm is applied to find refutation or generated lemmas in a given time. If the proposed algorithm does not find a proof, the filtered lemmas along with the original clause set are handed back to Eprover for further proof search in the left time. The total time of the three stages is the set total runtime for a problem testing. The runtime of the three stages are specified as strategy parameters.

The proposed S-CS deduction algorithms can generate a large number of S-CS clauses as lemmas in a given runtime, most of them are unit clauses because of the S-CS deduction based on optimized proof search. Adding a large number of lemmas will greatly increase the size of the clause set and reduce the efficiency of the third stage of proof search, so it is necessary to filter the generated lemmas. Lemmas filtration strategies are mainly achieved as follows:

  1. Setting a threshold on the number of literals in the lemmas, and feed to Eprover only the lemmas whose numbers of literals does not exceed the set threshold. In order to make a preliminary attempt, this threshold is fixed to 1.

  2. Setting a threshold on maximum term depth in the lemmas, and feed to Eprover only the lemmas whose maximum term depth does not exceed the set threshold.

  3. Setting a threshold on clause weight of the lemmas, and feed to Eprover only the lemmas whose clause weight does not exceed the set threshold.

  4. Requiring the lemmas to have at least one variable symbol.

  5. Requiring the lemmas to have at least one equality literal.

  6. Requiring the lemmas which are generated by a negated conjecture as an input clause.

The involved thresholds used for lemmas filtration are set by parameters in the heuristic strategies. For the convenience of description, the version of Eprover (E 2.3) which adds S-CS Dynamic Deduction Algorithm Based on Optimized Proof Search but not reusing clause, in short, called as SCSAN_E 2.3, and adds S-CS Dynamic Deduction Algorithm Based on Optimized Proof Search but reusing clause, in short, called as SCSA_E 2.3.

5. EXPERIMENTAL AND PERFORMANCE ANALYSIS

5.1. Experimental Setup

The two ATP systems SCSAN_E 2.3 and SCSA_E 2.3 are tested on FOF division (500 in total) of the CASC-J9 (in 2018) ATP System competition [35]. The experiments were carried on a PC with 3.6 GHz Inter(R) Core (TM) i7-4790 processor and 16GB memory, with OS Ubuntu15.04 64-bit, standard CPU time limit being 300 seconds. In order to confirm the correctness of proof procedure by S-CS dynamic deduction, Prover9 [36] check system is used to check each deduction step which is used for finding refutation. For experimental results comparisons, the plain E 2.3 is tested on the same hardware environment, and solved 362 problems.

5.2. Experimental Results and Analysis

(1) Performance analysis of SCSAN_E 2.3 on FOF division of the CASC-J9.

In order to evaluate the effectiveness of SCSAN, SCSAN_E 2.3 is tested on FOF division of the CASC-J9. Figure 2 shows the comparison on solved problems by SCSAN_E 2.3 and E 2.3.

Figure 2

Comparison on solved problems by SCSAN_E 2.3 and E 2.3.

From Figure 2, SCSAN_E 2.3 has solved 369 problems with 7 more than E 2.3 which has solved 362 problems. The average time spent for the 369 solved problems by SCSAN_E 2.3 is 23.79 seconds, and 20.14 seconds spent for the 362 solved problems by E 2.3. In special, the average time spent for 362 problems solved by SCSAN_E 2.3 is 18.95 seconds, 1.19 seconds less than that of E 2.3. Therefore, SCSAN_E 2.3 has better time efficiency than E 2.3 when solving the same number of problems.

According to the spent time point, within 150 seconds, SCSAN_E 2.3 has solved 350 problems with 12 more than E 2.3. Within 300 seconds, SCSAN_E 2.3 has solved 369 problems with 7 more than E 2.3. Experimental results show that SCSAN_E 2.3 outperformed E 2.3 in terms of the ability and time efficiency.

In contrast to the solutions by E 2.3, Table 1 lists the problems solved by SCSAN_E 2.3 but unsolved by E 2.3. For those 138 problems unsolved by E 2.3, SCSAN_E 2.3 solved 16 problems accounting for 11.6% of the total, with the average spent time is 226.57 seconds. Because these problems are relatively hard, so the time efficiency is acceptable. For those 16 solved problems, there are 2 problems with rating greater than or equal to 0.9, 12 problems with rating greater than 0.7, accounting for 12.5% and 75% of the total (listed in Table 1) respectively. Experimental results show that SCSAN_E 2.3 can solve some hard problems which unsolved by E 2.3.

Theorem Name Rating Number of Formulae Maximum Formula Depth Number of Variables Solved Time (Seconds)
AGT013+2 0.72 923 8 70 257.64
AGT018+1 0.62 556 8 71 207.86
AGT019+1 0.76 556 8 72 210.23
AGT022+1 0.83 556 8 72 275.84
AGT022+2 0.72 923 8 72 271.62
BOO109+1 0.57 3 6 9 225.23
GEO506+1 0.83 143 22 564 268.33
GEO511+1 0.9 162 22 650 217.74
NUM320+1 0.62 402 19 122 177.64
SWB012+1 0.79 560 27 977 278.57
SWB016+1 0.69 559 27 973 193.14
SWB027+1 0.79 560 27 976 176.3
SWB082+1 0.79 560 27 973 196.07
SWB094+1 0.79 560 27 973 194.13
SWB098+1 0.79 560 27 973 196.22
SWW189+1 0.9 1150 14 3058 278.63
Table 1

List of problems solved by SCSAN_E 2.3 but unsolved by E 2.3.

(2) Performance analysis of SCSA_E 2.3 on FOF division of the CASC-J9.

In order to evaluate the effectiveness of SCSA, SCSA_E 2.3 is also tested on FOF division of the CASC-J9. Figure 3 shows the comparison on solved problems by SCSA_E 2.3 and E 2.3.

Figure 3

Comparison on solved problems by SCSA_E 2.3 and E 2.3.

From Figure 3, SCSA_E 2.3 has solved 379 problems with 17 more than E 2.3 which has solved 362 problems. The average time spent for the 379 solved problems by SCSA_E 2.3 is 31.8 seconds, and 20.14 seconds spent for the 362 solved problems by E 2.3. In special, the average time spent for 362 problems solved by SCSA_E 2.3 is 21.9 seconds, 1.76 seconds more than that of E 2.3, so the time efficiency is acceptable.

According to the spent time point, within 200 seconds, SCSA_E 2.3 has solved 361 problems with 4 more than E 2.3. Within 250 seconds, SCSA_E 2.3 has solved 374 problems with 12 more than E 2.3. Within 300 seconds, SCSA_E 2.3 has solved 379 problems with 17 more than E 2.3. Experimental results show that SCSA_E 2.3 also outperformed E 2.3 in terms of the ability, and the time efficiency is slightly less than E 2.3.

In contrast to the solutions by E 2.3, Table 2 lists the problems solved by SCSA_E 2.3 but unsolved by E 2.3. For those 138 problems unsolved by E 2.3, SCSA_E 2.3 solved 21 problems accounting for 15.2% of the total, with the average spent time is 221.34 seconds, the time efficiency are also acceptable. For those 21 solved problems, there are 3 problems with rating greater than or equal to 0.9, 16 problems with rating greater than 0.7, accounting for 14.3%, and 76.2% of the total (listed in Table 2) respectively. Experimental results show that SCSA_E 2.3 also can solve some hard problems which unsolved by E 2.3.

Theorem Name Rating Number of Formulae Maximum Formula Depth Number of Variables Solved Time (Seconds)
AGT013+2 0.72 923 8 70 291.47
AGT018+1 0.62 556 8 71 186.88
AGT019+1 0.76 556 8 72 235.29
AGT022+1 0.83 556 8 72 248.56
AGT022+2 0.72 923 8 72 241.11
BOO109+1 0.57 3 6 9 258.08
GEO506+1 0.83 143 22 564 263.49
GEO511+1 0.9 162 22 650 233.32
GRA002+1 0.76 18 13 68 234.35
NUM320+1 0.62 402 19 122 161.75
SET722+4 0.52 29 19 138 217.39
SEU252+2 0.93 311 14 661 218.75
SWB027+1 0.79 560 27 976 186.34
SWB082+1 0.79 560 27 973 190.12
SWB088+1 0.83 560 27 973 174.19
SWB094+1 0.79 560 27 973 170.09
SWB098+1 0.79 560 27 973 173.14
SWB104+1 0.83 560 27 975 213.75
SWW189+1 0.9 1150 14 3058 247.04
SWW470+6 0.83 623 19 2150 233.65
SYO606+1 0.29 9 14 55 269.32
Table 2

List of problems solved by SCSA_E 2.3 but unsolved by E 2.3.

(3) Comprehensive analysis of the two systems SCSAN_E 2.3 and SCSA_E 2.3.

According to the solutions of SCSAN_E 2.3 and SCSA_E 2.3 on FOF division of the CASC-J9, we can see that both SCSAN_E 2.3 and SCSA_E 2.3 outperform E 2.3, and SCSA_E 2.3 outperforms SCSAN_E 2.3. More interesting is that SCSAN_E 2.3 solves 16 unsolved problems by E 2.3, and SCSA_E 2.3 solves 21 unsolved problems by E 2.3, 14 problems of them are the same. The comprehensive analysis shows that SCSAN_E 2.3 and SCSA_E 2.3 have different main points in the proof search, which SCSAN_E 2.3 uses more input clauses in clause set to participate in S-CS deduction, but the full use of each input clause is insufficient, and SCSA_E 2.3 uses a relatively small number of input clauses in clause set to participate in S-CS deduction, but the full use of each input clause is relatively sufficient. Furthermore, the problems which solved by SCSA_E 2.3 but unsolved by E 2.3 basically include that of SCSAN_E 2.3. Therefore, the full use of each input clause to participate in S-CS dynamic deduction is a good deduction method for S-CS rule.

The experimental analysis shows that the proposed S-CS dynamic deduction algorithm based on optimized proof search is an effective method, which is an effective complement to the binary resolution method, and can be effectively applied to the first-order logic problem proving. The effectiveness of S-CS deduction is mainly reflected in:

  1. In contrast to binary resolution methods, S-CS rule is a multi-clause dynamic deduction method, which improve binary resolution method only eliminating a complementary pairs of literals (two literals) to an S-CS (a literal set), so it has a strong literal elimination ability, and the S-CS clauses have few literals, thus improve the deduction efficiency.

  2. In contrast to binary resolution methods, S-CS rule can handle multiple (two or more) clauses in each deduction step, it has advantageous for handling input clauses with large number of literals.

  3. The current advanced ATP systems adopt binary resolution methods under saturation algorithm, and the proof search of S-CS dynamic deduction can effectively supplement the proof search of binary resolution methods, such as the deduction description of Example 2.1.

  4. The proposed S-CS dynamic deduction algorithms preferentially generate S-CS clauses with a small number of literals, which is not equivalent to binary resolution methods using input clause with small number of literals first. That is because S-CS rule handles an input clause by separating a standard contradiction, it does not limit the number of literals in the input clause in S-CS deduction.

6. CONCLUSIONS AND FUTURE WORK

S-CS rule is a novel inference method for automated reasoning, its advantage is that it can handle multiple input clauses in each deduction step and generate S-CS clauses by separating standard contradictions, which are not the same as multiple complementary pairs of literals. Due to the dynamic characteristic of S-CS deduction and the challenges of proof search, the design of the S-CS dynamic deduction algorithm is further discussed. According to theory and practice analysis, two S-CS dynamic deduction algorithms including corresponding heuristic strategies are proposed in this paper, they can preferentially generate S-CS clauses with a small number of literals, and one of them can give full play to the ability of synergized deduction with the same input clause. The proposed two S-CS dynamic deduction algorithms are applied to the state of the art ATP system Eprover 2.3, which are called as SCSAN_E 2.3 and SCSA_E 2.3 respectively. The two ATP systems are tested on the FOF division of the CASC-J9. Both SCSAN_E 2.3 and SCSA_E 2.3 show excellent performance, which outperform the plain E 2.3 to a certain extent. Experimental results show that S-CS rule is an important complement to the binary resolution method, and the proposed S-CS dynamic deduction algorithm can effectively improve the capability of the state of the art ATP systems, and can be effectively applied to the first-order logic problem proving.

Although different kinds of heuristic strategies has proposed for SCSAN and SCSA, and deduction backtracking algorithm is used for effectively optimizing proof search, there is still room for improvement, such as how to effectively select an input clause and a literal to separate a standard contradiction, how to further exert the synergized ability of multiple input clauses, and how to effectively filter lemmas for the further proof search. Because the clause selection strategy, the literal selection strategy and lemmas filtration strategy are foundational implementation, effective use and further development of these strategies are the main future research work. The proposed S-CS dynamic deduction algorithm does not include equality handling, adding the rewriting method is also a future research work.

CONFLICT OF INTEREST

There is no conflict of interest.

AUTHORS' CONTRIBUTIONS

Feng Cao has proposed the main idea of the paper, including the algorithm implementation, application and experimental evaluation. Yang Xu, Shuwei Chen, Jian Zhong, Guanfeng Wu have contributed with a lot of good suggestions for the algorithm implementation. In addition, Shuwei Chen has contributed with some writing, review and helpful comments to further enhance the quality of the paper.

ACKNOWLEDGMENTS

This paper is supported by the National Natural Science Foundation of China (Grant No.61673320).

REFERENCES

2.A. Quaife, Automated Development of Fundamental Mathematical Theories, Kluwer Academic Publishers, 1992.
23.X.H. Liu, A new semantic resolution principle, J. Jilin Univ., Vol. 23, 1978, pp. 112-117. (in Chinese) http://www.cnki.com.cn/Article/CJFD1979-JLDX197802012.htm
26.S. Schulz, E-a brainiac theorem prover, AI Commun., Vol. 15, 2002, pp. 111-126. https://content.iospress.com/articles/ai-communications/aic260
34.S. Schulz, The source of E 2.3, 2019. downloaded on https://wwwlehre.dhbw-stuttgart.de/~sschulz/E/E.html
35.G. Sutcliffe, The CADE ATP system competition, 2018. http://www.tptp.org/CASC/J9
36.W.W. McCune, Prover9 and Mace4, 2017. downloaded on http://www.cs.unm.edu/~mccune/prover9/
Journal
International Journal of Computational Intelligence Systems
Volume-Issue
12 - 2
Pages
1245 - 1254
Publication Date
2019/11/12
ISSN (Online)
1875-6883
ISSN (Print)
1875-6891
DOI
10.2991/ijcis.d.191022.002How to use a DOI?
Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Feng Cao
AU  - Yang Xu
AU  - Shuwei Chen
AU  - Jian Zhong
AU  - Guanfeng Wu
PY  - 2019
DA  - 2019/11/12
TI  - A Contradiction Separation Dynamic Deduction Algorithm Based on Optimized Proof Search
JO  - International Journal of Computational Intelligence Systems
SP  - 1245
EP  - 1254
VL  - 12
IS  - 2
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.d.191022.002
DO  - 10.2991/ijcis.d.191022.002
ID  - Cao2019
ER  -