In 1974, Kathleen Saunders-Nordeen was appointed as the first Director of Women's Intercollegiate Athletics. In 1975 the Oral History Archive did an interview. Here are some excerpts.

Interviewer: "I do know that women have had a hard time at the University from way back-- the whole coeducational idea was vigorously fought at the turn of the century by the administration. Chadburn was for really considerable segregation of the sexes in terms of practically all activities."

KSN: "I think to a great extent, women perpetuated that."

Interviewer: "How do you mean?"

KSN: "Back in the late 20s or early 30s, we had some of the strongest leaders in the country here on our campus as far as Physical Education is concerned. And women's athletics have grown out of Physical Education, because that's where the leadership was, but they've also been slowed down a whole lot by Physical Education leaders and the feeling that it was wrong to put all your efforts and all your facilities and all your finances into making a small number of people really highly skilled and really concentrating all your efforts on a small number of people.

"The philosophy was that it was far better to have a thousand women who were really mediocre athletes, given the opportunity, and it was as if you couldn't possibly do both."

Interviewer: "Why not?"

KSN: "That's the point. Now maybe at that point resources were so limited that you couldn't, or maybe it was just real narrow kind of thinking. But this thinking, and I think Blanche Trilling who was here on our campus and does deserve a lot of credit in Physical Education, has been one of the leaders in the country who kept women's athletics from progressing.

"From then on, for 40 or 50 years, nothing happened in women's athletics. Before that, we had very strong women's teams in a lot of sports-- there were 250 women out for rowing in 1920. Of course they were largely intramural teams, but the interest was there, and the skill was there.

"But things just kind of died for about 40 or 50 years, and during that time the men's programs were still growing. And now when women are trying to get back into facilities and back into budgets, it's very difficult. We're trying to crack back into something that's been allowed to grow and we've been just dragging our feet."

==========

KSN: "The women's crew has done really very well, they've gotten themselves recognized nationally, which is very good in 3 or 4 years."

Interviewer: "That recently? Even though they've been rowing since the 20s?"

KSN: "Well, they rowed in the 20s, and then there was no rowing again until about three years ago when several women students from one of the lakeshore dorms decided it might be kind of fun to get out there in those shells. They came to me at Lathrop and we sent them to the men's crew coach who said, OK, We'll give it a try, see how many people you have interested. I don't know that he thought they'd make it through the first winter, because he really zapped it to them with the same kind of training that the guys have-- which meant running several miles in below zero weather, working in the tank rowing, and working on the odometers and lifting weights and doing all these things. Instead of that having that make them lose interest, it grew, because women really did want to push themselves and see how good they could become."

==========

KSN: "I'm sure the women's movement has helped a lot, because it's okay for women to be good at almost anything they want now. Physical activity just happens to be one of those things. It's kind of a new freedom or something."

==========

KSN: "Right now, it looks very much like the interpretation of Title IX is going to be that if there are scholarships given for men in any institution, then you've got to do it for women also. Right now the income producing sports are not exempted from the guidelines of Title IX; It looks to me that they probably will be exempted, if the guidelines come out at all.

"Now my understanding is that they're getting kicked around so much because of pressure groups like NCAA who don't want athletics included in it at all, that we may never get the guidelines out, which would be really a shame.

It was pointed out to me a couple weeks ago that even if the guidelines never come out, never get signed by the president, the impact that they've had and the threat that they've had to institutions has done an awful lot to programs for women getting started, just the idea that they were there. I think also some institutions, we have to give some credit, it's not just the threat behind Title IX but the concept that was pointed out so vividly by Title IX that women's athletic programs really have been short-changed and it really is discrimination."

## 18 March 2010

## 04 March 2010

### Shattering Coefficient and VC Dimension

Here is an excerpt from a book called "Reliable Reasoning, Induction and Statistical Learning Theory." I read this book about a year ago and really liked its description of VC dimension and shattering coefficient in statistical learning. These aren't easy concepts, so I wanted to put the description here for reference.

The setup is that you have a set of hypotheses or rules, C, that are used in order to classify data. For example, if we are trying to decide whether or not a patient is healthy, we may look at his resting pulse and systolic blood pressure. Then I could make a rule that says if his pulse is above 100 and his systolic blood pressure above 140, then he is not healthy. A rule like this helps us make a decision whether or not the patient is healthy. Now assume you do not have expert information (that high blood pressure is bad) but you have data points-- measurements of pulse and blood pressure-- and

"How can we use data to choose a good rule from C? One obvious idea is to select a rule from C with the least error on the data. Then we use that rule in order to classify new data. This is basically the method of enumerative induction. Our question then, is: How good is this version of enumerative induction for choosing a rule from C?

"Clearly, it depends on what rules are in the set C from which the rule is to be chosen. If all possible rules are in that set, then there will be many rules with least error on the data, and these will give completely different advice about new cases. ...

"Of course, restricting the rules in C runs the risk of not including the best of all possible rules, the rule with the least expected error on new cases. ...

"There are other possible inductive methods for choosing rules-- methods that do not just choose the rule with the least error on the data. One such method balances data-coverage against something else, such as the simplicity of a given rule. ... but now let us concentrate on what is needed for the sort of enumerative induction that simply chooses the rule in C with the least error on the data.

"So now consider the question of how the rules in C might be restricted if enumerative induction in this sense is to be guaranteed to work, given enough evidence, no matter what the background statistical probability distribution.

"The answer to this question is one of the great discoveries of statistical learning theory-- the discovery of the importance of the

... from p45: "What has to be true of the set C in order to guarantee that, with probability approaching 1, given more and more data, the expected error for the rules that enumerative induction endorses at each stage will approach the minimum value of expected error for rules in C?

"You might wonder whether this sort of convergence isn't guaranteed by the statistical law of large numbers. That principle implies that with probability approaching 1, the empirical error of any particular rule will approach the expected error of that rule, given more and more data. But this is not the same as what is wanted. The trouble is that, given infinitely many rules, as more and more data are taken into account, the rules endorsed by enumerative induction can change infinitely often. Even if the empirical error for each rule approaches a limit, that does not imply anything about the limit of the empirical error of the varying rule endorsed by enumerative induction at each stage. ...

"What is needed then is not just that the empirical error of each rule should converge to its expected error but also that the empirical error of the varying rules endorsed by enumerative induction should approach the value of the expected error of that rule in the limit. If c_n is a rule endorsed by enumerative induction after n data points, then what is needed is that the empirical error of the rule c_n after n data points should approach the expected error of c_n in the limit. ...

"This will happen if (with probability approaching 1) the empirical error of the rules in C converge

"What has to be true of the set of rules C for such uniform convergence? Vapnik and Chervonenkis (1968) show (in effect) that this condition is met for classification rules if and only if the set of classification rules C is not too rich, where the richness of the set is measured by what has come to be called its "VC dimension." ...

"Suppose that some set of N points in the feature space is

"Notice that the definition of VC dimension refers to

"Some other examples: The VC dimension of the set of all linear separations in D-dimensional space is D+1. The VC dimension of the set of all inner rectangles in the plane is 4. The VC dimension of the set of all unions of rectangles in the plane is infinite.

"So, that is what VC dimension comes to. Vapnik and Chervonenkis (1968) show roughly that enumerative induction is guaranteed to work no matter what the background probability distribution if and only if the classification rules in C have a finite VC dimension."

The setup is that you have a set of hypotheses or rules, C, that are used in order to classify data. For example, if we are trying to decide whether or not a patient is healthy, we may look at his resting pulse and systolic blood pressure. Then I could make a rule that says if his pulse is above 100 and his systolic blood pressure above 140, then he is not healthy. A rule like this helps us make a decision whether or not the patient is healthy. Now assume you do not have expert information (that high blood pressure is bad) but you have data points-- measurements of pulse and blood pressure-- and

*labels*for each data point which say "healthy" or "not healthy". From here we would like to*learn*a rule that can classify other patients as healthy or not healthy, using the same measurements. From p23:"How can we use data to choose a good rule from C? One obvious idea is to select a rule from C with the least error on the data. Then we use that rule in order to classify new data. This is basically the method of enumerative induction. Our question then, is: How good is this version of enumerative induction for choosing a rule from C?

"Clearly, it depends on what rules are in the set C from which the rule is to be chosen. If all possible rules are in that set, then there will be many rules with least error on the data, and these will give completely different advice about new cases. ...

"Of course, restricting the rules in C runs the risk of not including the best of all possible rules, the rule with the least expected error on new cases. ...

"There are other possible inductive methods for choosing rules-- methods that do not just choose the rule with the least error on the data. One such method balances data-coverage against something else, such as the simplicity of a given rule. ... but now let us concentrate on what is needed for the sort of enumerative induction that simply chooses the rule in C with the least error on the data.

"So now consider the question of how the rules in C might be restricted if enumerative induction in this sense is to be guaranteed to work, given enough evidence, no matter what the background statistical probability distribution.

"The answer to this question is one of the great discoveries of statistical learning theory-- the discovery of the importance of the

*Vapnik-Chervonenkis*dimension, or VC dimension, of a set of rules. The VC dimension is a measure of the "richness" of the set of rules and it is inversely related to the degree of falsifiability of the set.... from p45: "What has to be true of the set C in order to guarantee that, with probability approaching 1, given more and more data, the expected error for the rules that enumerative induction endorses at each stage will approach the minimum value of expected error for rules in C?

"You might wonder whether this sort of convergence isn't guaranteed by the statistical law of large numbers. That principle implies that with probability approaching 1, the empirical error of any particular rule will approach the expected error of that rule, given more and more data. But this is not the same as what is wanted. The trouble is that, given infinitely many rules, as more and more data are taken into account, the rules endorsed by enumerative induction can change infinitely often. Even if the empirical error for each rule approaches a limit, that does not imply anything about the limit of the empirical error of the varying rule endorsed by enumerative induction at each stage. ...

"What is needed then is not just that the empirical error of each rule should converge to its expected error but also that the empirical error of the varying rules endorsed by enumerative induction should approach the value of the expected error of that rule in the limit. If c_n is a rule endorsed by enumerative induction after n data points, then what is needed is that the empirical error of the rule c_n after n data points should approach the expected error of c_n in the limit. ...

"This will happen if (with probability approaching 1) the empirical error of the rules in C converge

*uniformly*to their expected error. ..."What has to be true of the set of rules C for such uniform convergence? Vapnik and Chervonenkis (1968) show (in effect) that this condition is met for classification rules if and only if the set of classification rules C is not too rich, where the richness of the set is measured by what has come to be called its "VC dimension." ...

"Suppose that some set of N points in the feature space is

*shattered*by rules in C in the sense that, for any possible labeling of those points, some rule in C perfectly fits the points so labeled. Then the VC dimension of the set of rules C is at least N. More specifically, the VC dimension of a set of rules C is the largest number N such that some set of N points in feature space is shattered by rules in C. ..."Notice that the definition of VC dimension refers to

*some*set of N points being shattered, not to all sets of N points being shattered. Consider the set of all linear classifications of points in the plane where YESes and NOs are separated by a straight line. The VC dimension of this set of classification rules is 3, because some sets of three points in the plane can be shattered by this class of rules and no set of four points can be shattered. Three collinear points (i.e., three points on the same straight line) cannot be shattered by this class of rules, because there is no such rule that can classify the middle point as a YES and the outer points as NOs. But three points that are not collinear can be shattered... And no four points can be shattered by this class of rules, so the VC dimension of these linear rules is exactly 3. ..."Some other examples: The VC dimension of the set of all linear separations in D-dimensional space is D+1. The VC dimension of the set of all inner rectangles in the plane is 4. The VC dimension of the set of all unions of rectangles in the plane is infinite.

"So, that is what VC dimension comes to. Vapnik and Chervonenkis (1968) show roughly that enumerative induction is guaranteed to work no matter what the background probability distribution if and only if the classification rules in C have a finite VC dimension."

Subscribe to:
Posts (Atom)