http://stats.stackexchange.com/questions/47235/decision-theory-reject-option

 

 

 

 

 

 

In decision theory, we define a reject option (θ ) so that when making decision is difficult, the case will be ignored.

Suppose 1/kθ1 :

  • If θ=1/k no cases will be rejected

    and

  • If θ=1 all cases will be rejected.

Why this happes and how can I prove it for θ=1 and 1/k ?

Section 1.5.3, chapter 1 of Pattern recognition by Bishop: (reject option)

We have seen that classification errors arise from the regions of input space where the largest of the posterior probabilities p(Ck|x) is significantly less than unity, or equivalently where the joint distributions p(x, Ck) have comparable values. These are the regions where we are relatively uncertain about class membership. In some applications, it will be appropriate to avoid making decisions on the difficult cases in anticipation of a lower error rate on those examples for which a classification decision is made. This is known as the reject option. For example, in our hypothetical medical illustration, it may be appropriate to use an automatic system to classify those X-ray images for which there is little doubt as to the correct class, while leaving a human expert to classify the more ambiguous cases. We can achieve this by introducing a threshold θ and rejecting those inputs x for which the largest of the posterior probabilities p(Ck|x) is less than or equal to θ. This is illustrated for the case of two classes, and a single continuous input variable x, in Figure 1.26. Note that setting θ = 1 will ensure that all examples are rejected, whereas if there are K classes then setting θ < 1/K will ensure that no examples are rejected. Thus the fraction of examples that get rejected is controlled by the value of θ.

 

 

 

 

 

 

 

 

 

An input x is rejected if the largest posterior probability maxi=1,,kP(belongs to class i|x)θ .

There are k classes and the probabilities must sum to 1:

i=1kP(belongs to class i|x)=1.

A consequence of this is that

maxi=1,,kP(belongs to class i|x)1/k

as the sum of probabilities otherwise necessarily must be less than 1 . This means that if θ=1/k the input is never rejected, since there is at least one posterior probability that is larger than θ .

 

Similarly, we have that

maxi=1,,kP(belongs to class i|x)1

with equality only if there are x that only can belong to one class. If no such x can exist,

maxi=1,,kP(belongs to class i|x)<1

which means that if θ=1 all inputs are rejected, since the maximum posterior probability always is less than 1 .

 

 

 

 

 

 

Posted by uniqueone
,