http://stats.stackexchange.com/questions/47235/decision-theory-reject-option
In decision theory, we define a reject option ( ) so that when making decision is difficult, the case will be ignored.
Suppose :
-
If no cases will be rejected
and
-
If all cases will be rejected.
Why this happes and how can I prove it for and ?
Section 1.5.3, chapter 1 of Pattern recognition by Bishop: (reject option)
We have seen that classification errors arise from the regions of input space where the largest of the posterior probabilities p(Ck|x) is significantly less than unity, or equivalently where the joint distributions p(x, Ck) have comparable values. These are the regions where we are relatively uncertain about class membership. In some applications, it will be appropriate to avoid making decisions on the difficult cases in anticipation of a lower error rate on those examples for which a classification decision is made. This is known as the reject option. For example, in our hypothetical medical illustration, it may be appropriate to use an automatic system to classify those X-ray images for which there is little doubt as to the correct class, while leaving a human expert to classify the more ambiguous cases. We can achieve this by introducing a threshold θ and rejecting those inputs x for which the largest of the posterior probabilities p(Ck|x) is less than or equal to θ. This is illustrated for the case of two classes, and a single continuous input variable x, in Figure 1.26. Note that setting θ = 1 will ensure that all examples are rejected, whereas if there are K classes then setting θ < 1/K will ensure that no examples are rejected. Thus the fraction of examples that get rejected is controlled by the value of θ.
An input is rejected if the largest posterior probability .
There are classes and the probabilities must sum to 1:
A consequence of this is that
as the sum of probabilities otherwise necessarily must be less than . This means that if the input is never rejected, since there is at least one posterior probability that is larger than .
Similarly, we have that
with equality only if there are that only can belong to one class. If no such can exist,
which means that if all inputs are rejected, since the maximum posterior probability always is less than .
'Machine Learning' 카테고리의 다른 글
About Feature Scaling and Normalization (0) | 2016.12.31 |
---|---|
Analyze and model data using statistics and machine learning (0) | 2016.12.29 |
#5. Training / Test / Validation Set : 오버피팅을 피하는 방법 Terry TaeWoong Um (0) | 2016.12.27 |
Machine Learning Matlab lectures (0) | 2016.12.20 |
기계 학습(Machine Learning)은 즐겁다! Part 2 (0) | 2016.12.19 |