|
|
Thread Tools | Display Modes |
#31
|
|||
|
|||
Numbers to think about
|
Ads |
#32
|
|||
|
|||
Numbers to think about
On 30 Jul 2006 10:39:17 -0700, "CowPunk"
wrote: real numbers a @12100 test performed 3.8% positive That we know. However, we do not know how many of the 3.8% positives are real and how many of them are false. In order to know that we would have to know exactly what this 1% error rate represents and we would have to know exactly how many clean and dirty samples were in the 12100 tests performed. Lets say we had 10000 samples and we knew that 5000 were dirty and 5000 were clean and we had a test that was 99% accurate with that 1% inaccuracy was evenly divided between false positives and false negatives. The most likely results would come out with 5000 clean and 5000 dirty. However, of those 5000 clean only 4950 would be real negatives and 50 would be false negatives. And the 5000 positives would be 4950 real positives and 50 false positives. In this sample only 1% of the positives are false. Now, if you take the same situation where 9000 of the samples are clean and 1000 are dirty. You would most likely get 8920 negatives and 1080 positives. The 8920 negatives would include 8910 real negatives and 10 false negative. The 1080 positives would include 90 false positives and 990 real positives. In this sample, around 8.2% of the positives are false. Finally, take the situation where 9900 of the samples are clean and 100 are dirty. You would most likely get 9802 negatives and 198 positives. The 9802 negatives would include 9801 real negatives and 1 false negative. The positives would include 99 real positives and 99 false positives. In this sample 50% of the positives are false. So you can see that the percentage of false positives in the total number of positives is dependent on the ratio of the number of dirty to clean samples you're testing. The higher the percentage of clean samples in the tests, the higher the percentage of false positives in the results. |
#33
|
|||
|
|||
Numbers to think about
On Sun, 30 Jul 2006 16:08:11 -0400, Jack Hollis
wrote: So you can see that the percentage of false positives in the total number of positives is dependent on the ratio of the number of dirty to clean samples you're testing. The higher the percentage of clean samples in the tests, the higher the percentage of false positives in the results. All this being said, the probability of FL's sample being a false positive is still 1 out of 100. |
#34
|
|||
|
|||
Numbers to think about
Jack Hollis wrote in
: On Sun, 30 Jul 2006 16:08:11 -0400, Jack Hollis wrote: So you can see that the percentage of false positives in the total number of positives is dependent on the ratio of the number of dirty to clean samples you're testing. The higher the percentage of clean samples in the tests, the higher the percentage of false positives in the results. All this being said, the probability of FL's sample being a false positive is still 1 out of 100. It is true that (assuming the 99% accuracy) that the probability of any negative sample testing positive is 1 in 100. However, once you have a positive result, this is meaningless. The question you must ask becomes 'is this positive a false one?' This probability for this is variable and is based on the total number of samples and the total number of positives. For every 100 samples, there will be 1 false positive. If there are only 100 samples, and there is only 1 positive, it is not 1 out of 100 that it is a false positive, it is very likely that it is a false positive. Ed |
#35
|
|||
|
|||
Numbers to think about
Montesquiou wrote:
"Jack Hollis" a écrit dans le message de news: You do need help. If the probability of error is 1%, then there will be 120 errors in a sample of 12000. Yes 120 errors, but in both side : False Negative - true positive = (12000 - 380)*1% = 116.2 False Positive - true Negative = 380*1%= 3.8 3.8+116.2 = 120 errors. As you an see the error is alway 3.8 out of 380 = 1% 116.2 out of 11600 = 1 % By chance ; ) This is meaningless, because a real test will usually have a different percentage chance of "false positive" and "false negative." For a trivial example (not restricted to drug testing - could be disease testing, genetic screening, etc), suppose that the test measures some number, and that we call a positive greater than some threshold X. We define X such that the chance of a false positive is small, i.e. the vast majority of people have results below X. If later we decide that the threshold was set too high or too low, we could change X. If we move X down, the number of false negatives goes down (we catch more of the guilty) and the number of false positives goes up (we catch more of the innocent). This is greater sensitivity and lower specificity. If we move X up, the opposite happens. The wikipedia link that Ernst Noch posted: http://en.wikipedia.org/wiki/Binary_classification gives more detail. Anyway, the point is that saying "The test has a 1% error rate" is not enough information, you need to talk about the kinds of errors. |
#36
|
|||
|
|||
Numbers to think about
On Sun, 30 Jul 2006 22:18:48 GMT, Ed wrote:
All this being said, the probability of FL's sample being a false positive is still 1 out of 100. It is true that (assuming the 99% accuracy) that the probability of any negative sample testing positive is 1 in 100. However, once you have a positive result, this is meaningless. The question you must ask becomes 'is this positive a false one?' This probability for this is variable and is based on the total number of samples and the total number of positives. No matter how you want to look at it, if a test is 99% accurate the probability of any single clean sample showing up to be a false positive is 1%. For every 100 samples, there will be 1 false positive. If there are only 100 samples, and there is only 1 positive, it is not 1 out of 100 that it is a false positive, it is very likely that it is a false positive. That assumes that all of the 100 samples are clean. I already set up a situation (9900 clean 100 dirty) where 50% of the positives are false. However, that doesn't mean that a positive result is likely to be wrong half of the time. The reason there are so many false positives in the final results of this hypothetical is because 99% of the samples are clean. It is tempting to say that because we know that 50% of the positives results in the group are false then the chance of any one positive result being false is 50%. However, you have to remember that the probability of any one sample ending up as real positive or false positive is not equal. The chance of a sample from the dirty group testing positive is 99% and the chance of a clean sample testing positive is 1%. The reason that 50% of the positives are false is because 99% of the samples are clean. You have to remember that the chance of a clean sample falling into the false negative group will always be 1%. No matter how you look at it, that basic fact does not change. I remember OJ's lawyer making a similar type argument when the prosecution said that a sample of blood found at the scene was OJ's blood type with a genetic marker that is only found in 1 out of 200 people. The lawyer immediately brought up the hypothetical of a full football stadium with 70,000 people. He asked the prosecution witness how many people in the stadium would have the same blood type and genetic marker as the blood found at the scene. The witness said 350. Then OJ's attorney says that in a football stadium there are 350 people who could have committed the crime. Then he went on to point out that there are tens of thousands of people in LA who could have committed the crime. It sounds a lot better to say that OJ is only one out of tens of thousands of people who could have committed the crime rather than saying that the chance of the killer being anyone other than OJ is 200 to 1. In a similar vein, it sound better to present a positive result as being in a group where half of the positive results are false rather than pointing out that the chance of any one clean sample being a false positive is 1%. |
#37
|
|||
|
|||
Numbers to think about
Jack Hollis wrote in
: On Sun, 30 Jul 2006 22:18:48 GMT, Ed wrote: All this being said, the probability of FL's sample being a false positive is still 1 out of 100. It is true that (assuming the 99% accuracy) that the probability of any negative sample testing positive is 1 in 100. However, once you have a positive result, this is meaningless. The question you must ask becomes 'is this positive a false one?' This probability for this is variable and is based on the total number of samples and the total number of positives. No matter how you want to look at it, if a test is 99% accurate the probability of any single clean sample showing up to be a false positive is 1%. For every 100 samples, there will be 1 false positive. If there are only 100 samples, and there is only 1 positive, it is not 1 out of 100 that it is a false positive, it is very likely that it is a false positive. That assumes that all of the 100 samples are clean. Yes, that was the setup. Suppose you did not know the number of clean jor dirty samples and 1 out of 100 tested samples were positive. Continuing with the assumption that the test is 99% accurate, is that person clean or dirty? Remember, if they were all clean, you would still get one positive. I already set up a situation (9900 clean 100 dirty) where 50% of the positives are false. However, that doesn't mean that a positive result is likely to be wrong half of the time. The reason there are so many false positives in the final results of this hypothetical is because 99% of the samples are clean. In your example, half of the positives are false - your number demonstrate this - if you were to line up all the positives and selected one, what is the chance that they were actually clean? 50%. This percentage is dependent on the total number of samples (which determines the total number of false positives) and the total number of true positives. (see below) It is tempting to say that because we know that 50% of the positives results in the group are false then the chance of any one positive result being false is 50%. However, you have to remember that the probability of any one sample ending up as real positive or false positive is not equal. The chance of a sample from the dirty group testing positive is 99% and the chance of a clean sample testing positive is 1%. The reason that 50% of the positives are false is because 99% of the samples are clean. You have to remember that the chance of a clean sample falling into the false negative group will always be 1%. No matter how you look at it, that basic fact does not change. That is not the question being raised. The question is what is the probability that a positive is false - this is not the same thing as what is the probability that a clean sample will test positive which is 1%. Whenever the number of false positives is close (or greater than) the number of 'dirty' samples, the test will not be valid. Imagine 1000 tests in which the accuracy of the test is 99% and the frequecy of dirty samples is 0.1%. In this case, one would expect 10 false positives (1000 x .01) and only 1 true positive (1000 x .001). Thus almost every positive you find will be false (10/11). This is because the number of true positives is very low. Take the same scenerio in which half the group (500) are dirty. You would still have 10 false but now almost 500 true positves. Thus the chance that a given positive test is wrong is 10/510 and the test is much more valid. IN each of these cases, the accuracy of the test is 99% but the chance that a positive test is valid is very different - due to the actual number of true positives, low in the first case and high in the second. |
#38
|
|||
|
|||
Numbers to think about
|
#40
|
|||
|
|||
Numbers to think about
wrote: Montesquiou wrote: "CowPunk" a écrit dans le message de news: ... 1% of 12000 = 120 120:380 ~ 1:3 Oh my friend !!! With all due respect if it is way they teach statistic in your country ... You are lost. However as I have many friends in the USA and I know they are not so ignorants in Math, I believe the problem is your. Since your original post you DECIDED that 1% of the test were wrong. So 1% of the 380 positive (that you DECIDED BY YOUR OWN) are wrong. 1% of 380 is 3.8. Turn your problem the way you want 1% is allway 1% and NEVER 1:3 (33.33 %) !! Oh my God, pls help me ! I am here and I will help you. First, in each test there is an A and B sample and the test is done on each. So if the there is a 1% chance of error on any give sample, then the probablitity or an error both is found be multiplying .01 times .01 or .0001 or 0.01%. Second, do not assume any error percentage until one appears in the scientific literature, that is one that has been established with a proven protocol and by actual perfroming many blind tests with samples of known quality. One of the difficulties in this area is that test error rates have not been established and made publicly available. If events A and B are entirely independent, then the chance of error would be the product of the individual error rates - regarding non-deterministic errors. If the errors in the overall testing system are deterministic in nature, related, possibly occurring before the samples were separated (or something inherent in the testing procedure) ... then things don't decompose quite so easily into independent events. -bdbafh |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Wacky numbers on HR monitor | [email protected] | Techniques | 6 | May 10th 06 02:21 AM |
frame deflection measurements - any numbers? | [email protected] | Techniques | 0 | March 22nd 06 01:09 PM |
Ultegra Caliper Model Numbers ? | Magnusfarce | Techniques | 3 | April 16th 05 02:03 PM |
disc brake caliper numbers | Richard Goodman | UK | 2 | September 3rd 03 12:13 AM |