Slides from Università Di Pisa about Rule-based Classifiers. The Pdf introduces rule-based classifiers in data mining, explaining the greedy approach for rule extraction and evaluation metrics like Accuracy, Laplace, and M-estimate. This University Computer science material is well-structured for autonomous study.
See more36 Pages


Unlock the full PDF for free
Sign up to get full access to the document and start transforming it with AI.
Anna Monreale
Computer Science Department
Introduction to Data Mining, 2nd Edition
Chapter 5.1
UNIVERSITÀ DI PISA
IN SUPREN
GNITATIS .
.
1343Rule-based Classifier
. Classify records by using a collection of "if ... then ... "
rules
· Rule:
(Condition) -> y
- where
· Condition is a conjunction of tests on attributes
· y is the class label
- Examples of classification rules:
. (Blood Type=Warm) ^ (Lay Eggs=Yes) -> Birds
· (Taxable Income < 50K) ^ (Refund=Yes) -> Evade=No
UNIVERSITÀ DI PISA
REMÆ DIG
IN SUPRE
.
1343
R1: (Give Birth = no) ^ (Can Fly = yes) -> Birds
R2: (Give Birth = no) A (Live in Water = yes) -> Fishes
R3: (Give Birth = yes) ^ (Blood Type = warm) -> Mammals
R4: (Give Birth = no) A (Can Fly = no) -> Reptiles
R5: (Live in Water = sometimes) -> Amphibians
| Name | Blood Type | Give Birth | Can Fly | Live in Water | Class |
| human | warm | yes | no | no | mammals |
| python | cold | no | no | no | reptiles |
| salmon | cold | no | no | yes | fishes |
| whale | warm | yes | no | yes | mammals |
| frog | cold | no | no | sometimes | amphibians |
| komodo | cold | no | no | no | reptiles |
| bat | warm | yes | yes | no | mammals |
| pigeon | warm | no | yes | no | birds |
| cat | warm | yes | no | no | mammals |
| leopard shark | cold | yes | no | yes | fishes |
| turtle | cold | no | no | sometimes | reptiles |
| penguin | warm | no | no | sometimes | birds |
| porcupine | warm | yes | no | no | mammals |
| eel | cold | no | no | yes | fishes |
| salamander | cold | no | no | sometimes | amphibians |
| gila monster | cold | no | no | no | reptiles |
| platypus | warm | no | no | no | mammals |
| owl | warm | no | yes | no | birds |
| dolphin | warm | yes | no | yes | mammals |
| eagle | warm | no | yes | no | birds |
UNIVERSITÀ DI PISA
IN SUPRE
EMÆ DIG
GNITATIS .
.
1343
R1: (Give Birth = no) ^ (Can Fly = yes) -> Birds
R2: (Give Birth = no) A (Live in Water = yes) -> Fishes
R3: (Give Birth = yes) A (Blood Type = warm) -> Mammals
R4: (Give Birth = no) ^ (Can Fly = no) -> Reptiles
R5: (Live in Water = sometimes) -> Amphibians
| Name | Blood Type | Give Birth | Can Fly | Live in Water | Class |
| hawk | warm | no | yes | no | ? |
| grizzly bear | warm | yes | no | no | ? |
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => Mammal
UNIVERSITÀ DI PISA
IN SUPRE
.
EMÆ DIG
GNITATIS .
1343
| Tid | Refund | Marital Status | Taxable Income | Class |
| 1 | Yes | Single | 125K | No |
| 2 | No | Married | 100K | No |
| 3 | No | Single | 70K | No |
| 4 | Yes | Married | 120K | No |
| 5 | No | Divorced | 95K | Yes |
| 6 | No | Married | 60K | No |
| 7 | Yes | Divorced | 220K | No |
| 8 | No | Single | 85K | Yes |
| 9 | No | Married | 75K | No |
| 10 | No | Single | 90K | Yes |
(Status=Single) -> No
Coverage = 40%, Accuracy = 50%
UNIVERSITÀ DI PISA
IN SUPREM
GNITATIS .
.
1343
R1: (Give Birth = no) A (Can Fly = yes) -> Birds
R2: (Give Birth = no) ^ (Live in Water = yes) -> Fishes
R3: (Give Birth = yes) A (Blood Type = warm) -> Mammals
R4: (Give Birth = no) A (Can Fly = no) -> Reptiles
R5: (Live in Water = sometimes) -> Amphibians
| Name | Blood Type | Give Birth | Can Fly | Live in Water | Class |
| lemur | warm | yes | no | no | ? |
| turtle | cold | no | no | sometimes | ? |
| dogfish shark | cold | yes | no | yes | ? |
A lemur triggers rule R3, so it is classified as a mammal
A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
EMÆ DIG
UNIVERSITÀ DI PISA
IN SUPRE
.
GNITATIS .
1343
UNIVERSITÀ DI PISA
EMÆ DIG
IN SUPRE
GNITATIS .
.
1343
UNIVERSITÀ DI PISA
EMÆ DIG
IN SUPRE
GNITATIS .
.
1343
R1: (Give Birth = no) ^ (Can Fly = yes) -> Birds
R2: (Give Birth = no) ^ (Live in Water = yes) -> Fishes
R3: (Give Birth = yes) A (Blood Type = warm) -> Mammals
R4: (Give Birth = no) A (Can Fly = no) -> Reptiles
R5: (Live in Water = sometimes) -> Amphibians
| Name | Blood Type | Give Birth | Can Fly | Live in Water | Class |
| turtle | cold | no | no | sometimes | ? |
UNIVERSITÀ DI PISA
REMA DIG
IN SUPRE
GNITATIS .
.
1343
UNIVERSITÀ DI PISA
EMÆ DIG
IN SUPRE
GNITATIS .
.
1343
(Refund=Yes) == > No
(Refund=No, Marital Status={Single, Divorced},
Taxable Income<80K) == > No
(Refund=No, Marital Status={Single, Divorced},
Taxable Income>80K) == > Yes
(Refund=No, Marital Status={Married}) == > No
UNIVERSITÀ DI PISA
IN SUPRE
EMÆ DIG
GNITATIS .
.
1343
(Refund=Yes) == > No
(Refund=No, Marital Status={Single, Divorced},
Taxable Income<80K) == > No
(Refund=No, Marital Status={Married}) == > No
(Refund=No, Marital Status={Single, Divorced},
Taxable Income>80K) == > Yes
UNIVERSITÀ DI PISA
IN SUPRE
EMÆ DIG
GNITATIS .
.
1343
UNIVERSITÀ DI PISA
IN SUPRE
.
EMÆ DIG
GNITATIS .
1343
UNIVERSITÀ DI PISA
IN SUPRE
.
EMÆ DIG
GNITATIS .
1343
-
+
-
-
+ +
-
+ +
-
-
-
-
+
-
1
+
+
+
+
+
+
+
(i) Original Data
I
+ +
-
+
+ :
I
-
I
+
+
+
+
+
-
(ii) Step 1
UNIVERSITÀ DI PISA
EMÆ DIG
IN SUPRE
GNITATIS .
.
1343
+
+
+
+ +
R1
-
-
+
-
+
+
+
+
+
+
+
(iii) Step 2
-
R1
-
-
+
R2
++
+
(iv) Step 3
UNIVERSITÀ DI PISA
EMÆ DIG
IN SUPRE
.
1343
GNITATIS .
UNIVERSITÀ DI PISA
EMÆ DIG
IN SUPRE
GNITATIS .
.
1343
UNIVERSITÀ DI PISA
EMÆ DIG
IN SUPRE
GNITATIS .
.
1343
Two common strategies:
The initial rule contains the same conjuncts
as the attribute values a selected record
Yes: 3
No: 4
Refund=No,
Status=Single,
Income=85K
(Class=Yes)
Refund=No,
Status=Single,
Income=90K
(Class=Yes)
Refund=
No
Status =
Single
Status =
Divorced
Status =
Married
.
Income
> 80K
Yes: 3
No: 4
Yes: 2
No: 1
Yes: 1
No: 0
Yes: 0
No: 3
Yes: 3
No: 1
(a) General-to-specific
UNIVERSITÀ DI PISA
EMÆ DIG
IN SUPRE
.
GNITATIS .
1343
Refund=No,
Status = Single
(Class = Yes)
(b) Specific-to-general
n
c
n
- Accuracy =
n +1
- Laplace
=
n+ k
nc + kp
- M-estimate
n+k
n : Number of instances
covered by rule
n : Number of instances
covered by rule with class c
k : Number of classes
p : Prior probability
UNIVERSITÀ DI PISA
. IN SUPREN
GNITATIS .
1343
–
Gain(Ro, R1) = P1x[ log2
-
Po
P1 + n1
-) - log2
-
Po + no
)]
- Po: number of positive instances covered by RO
no: number of negative instances covered by RO
P1: number of positive instances covered by R1
n1: number of negative instances covered by R1
FOIL: First Order Inductive Learner - an early rule-based learning algorithm
UNIVERSITÀ DI PISA
IN SUPRE
.
EMÆ DIG
GNITATIS .
1343
UNIVERSITÀ DI PISA
EMÆ DIG
IN SUPRE
GNITATIS .
.
1343
k
R= 2
i=1
filog(fi/ei),
Dataset: 60 positive records and 100 negative records
Rule 1: covers 50 positive records and 5 negative examples,
Rule 2: covers 2 positive records and no negative examples.
UNIVERSITÀ DI PISA
EMÆ DIG
IN SUPRE
GNITATIS .
.
1343