JANUARY 2000
Johnny might not be math-challenged; his problem could just be that he’s an auditory learner
“I’m not stupid, I’m auditory,” was how one student reacted after taking the learning-styles diagnostic test developed and administered by Diablo Valley College, a public, two-year community college outside San Francisco, California. “I realize there’s nothing ‘wrong’ with me; I just process information differently,” was another student’s comment.
These are just two of several success stories cited by Diablo Valley College mathematics instructor Suzanne Miller in a report on the test she gave at the California Mathematics Council meeting in Monterey, CA, last month.
The diagnostic is based on Miller’s research into learning styles, and was written by the college’s learning disability specialist Catherine Jester. It comprises 32 multiple-choice questions designed to ascertain a student’s natural learning style, and has been freely available on the Web since January 1998. It takes just a few minutes to complete, and the result—a profile of the student’s learning style along with specific suggestions of how best to study—is available immediately. To date, over 10,000 students from Diablo Valley College and elsewhere have used it to overcome math anxiety and improve their performance in mathematics.
Miller’s initial analysis of the data generated by the test indicate that there may be a good reason why so many people find learning math so difficult. Among males aged between 18 and 25, Miller found, just 17% are suited to learn through reading text. For the remaining 83%, the standard college textbook is little more than dead weight to carry around in their bag! The figure for women in the same age group is a bit higher: just under 35% can learn from textually presented information.
These figures contrast with those for students aged 35 or over—a substantial population in today’s community college community. In this age-group, 27% of males and over 42% of females find it natural to learn from reading. But that’s still less than half the student population. Miller does not know whether the difference between the two age-groups is a direct consequence of growing older, or is a reflection of changes in the environment in which today’s under 25s grew up. The answer to that question will have to await a follow-up study she hopes to carry out when today’s younger respondents grow older.
By far the most powerful method of learning among all age-groups is visual nonverbal: diagrams, tables, illustrations, pictures, and video. Among the 18-25 age-group, 48.1% of males and 36.2% of females favor this method of learning. The figures for the over-35s are almost identical: 46.0% and 38.8%. Half a century after the dawn of the television age, these results are perhaps not surprising. But the vast majority of mathematics courses are still structured around the traditional college textbook.
At a time when many people are taking college courses on the Internet, it is worth taking note of another of Miller’s findings: that a surprisingly high proportion of people learn best from listening. In the 18-25 age-group, 38.0% of males and 31.3% of females are predominantly auditory learners; among the over 35s, 35.2% of males and 25.7% of females. If providers of Internet-based education want to reach those individuals, they had better provide instruction by voice as well as text, illustrations, and video.
Among Miller’s own online math students is a married couple with a 4 month old child. The husband is visual non-verbal, draws pictures and can then ‘see’ the answer to the problem. His wife is a reader who works step by step. As Miller remarks, “They have not been able to collaborate because his way confuses her and vice versa.”
The fourth group of learners Miller’s study has identified are those who learn best in a tactile or kinesthetic fashion, by manipulating objects or gesticulating with their hands. Among the younger age-group, 20.2% of men and 20.7% of women learn best in this fashion; among the older students, the figures are 14.1% and 13.1%, respectively. Don’t expect those individuals to succeed unless they are free to stand up and move around.
Creating even more of a challenge for teachers, between 20 and 24% of students do not fall cleanly into one particular category, but exhibit a hybrid learning style that spans two or more of the four categories.
I took the test myself. It diagnosed me (correctly) as primarily suited to learning by reading and secondarily by visually presented information (diagrams, graphs, tables, etc.). Among the seven specific suggestions to improve my learning, my diagnostic report said I should “Write out sentences and phrases that summarize key information obtained from your textbook and lecture”, that I should “Make flashcards of vocabulary words and concepts that need to be memorized,” and that “When learning information presented in diagrams or illustrations, write out explanations for the information.” I did all of these when I was a student. Indeed, the only suggestions I did not use are those involving computers, which were not available in my own student days. Looking back, I was lucky that my learning style fitted so well into the educational system prevalent at the time. I did well. Many are not so fortunate.
In her presentation at Monterey, Miller observed that “The project has made faculty more aware of the importance of understanding diverse learning styles and designing course work to reach the broadest possible spectrum of styles.” Perhaps more significant, she says, “It helps students by identifying their strengths, encouraging them to become active managers of their educational resources and to take responsibility for their learning.”
Miller’s study had 3,596 males and 2,998 females in the 18-25 age-group and 707 males and 1,480 females aged 35 and over. The learning styles survey is available on the web at: http://silcon.com/~scmiller/lsweb/dvclearn.htm .
Devlin’s Angle is updated at the beginning of each month.
Keith Devlin ( devlin@stmarys-ca.edu) is Dean of Science at Saint Mary’s College of California, in Moraga, California, and a Senior Researcher at Stanford University. His latest book InfoSense: Turning Information Into Knowledge, which shows how a mathematical approach to information can help us to understand information flow and manage it more efficiently, was published by W. H. Freeman last August.
FEBRUARY 2000
The legacy of the Reverend Bayes
How do you use inconclusive evidence to assess the probability that a certain event will occur? One method that has become increasingly popular in recent years depends on a mathematical theorem proved by an 18th Century English Presbyterian minister by the name of Thomas Bayes. Curiously, Bayes’ theorem languished largely ignored and unused for over two centuries before statisticians, lawyers, medical researchers, software developers, and others started to use it in earnest during the 1990s.
What makes this relatively new technique of “Bayesian inference” soft mathematics is that it uses an honest-to-goodness mathematical formula (Bayes’ Theorem) in order to improve—on the basis of evidence—the best (human) estimate that a particular event will take place. In the words of some statisticians, it’s “mathematics on top of common sense.” You start with an initial estimate of the probability that the event will occur and an estimate of the reliability of the evidence. The method then tells you how to combine those two figures—in a precise, mathematical way—to give a new estimate of the event’s probability in the light of the evidence. In some highly constrained situations, both initial estimates may be entirely accurate, and in such cases Bayes’ method will give you the correct answer. In a more typical real-life situation, you don’t have exact figures, but as long as the initial estimates are reasonably good, then the method will give you a better estimate of the probability that the event of interest will occur. Thus, in the hands of an expert in the domain under consideration, someone who is able to assess all the available evidence reliably, Bayes’ method can be a powerful tool.
For example, suppose that you undergo a medical test for a relatively rare cancer. Your doctor tells you that, according to surveys by medical statisticians, the cancer has an incidence of 1% among the general population. Thus, before you take the test, and in the absence of any other evidence, your best estimate of your likelihood of having the cancer is 1 in 100, i.e. a probability of 0.01. Then you take the test. Extensive trials have shown that the reliability of the test is 79%. More precisely, although the test does not fail to detect the cancer when it is present, it gives a positive result in 21% of the cases where no cancer is present—what is known as a “false positive.” When you are tested, the test produces a positive diagnosis. The question is: Given the result of the test, what is the probability that you have the cancer?
Most people assume that if the test has a reliability rate of nearly 80%, and they test positive, then the likelihood that they have the cancer is about 80% (i.e., the probability is approximately 0.8). But they are way off. Given the scenario just described, the likelihood that they have the cancer is a mere 4.6% (i.e., the probability is 0.046). Still a worrying possibility, but hardly that scary 80%. The problem is, that (scary) 80% reliability figure for the test has to be balanced against the (more reassuring) low (1%) incidence rate of the cancer in the general population. Using Bayes’ method ensures you make proper use of all the evidence to hand.
In general, Bayes’ method shows you to calculate the probability of a certain event E (in the above example, having the cancer), based on evidence (e.g. the result of the medical test), when you know (or can estimate):
(1) the probability of E in the absence of any evidence;
(2) the evidence for E;
(3) the reliability of the evidence (i.e., the probability that the evidence is correct).
In the cancer example, the probability in (1) is 0.01, the evidence in (2) is that the test came out positive, and the probability in (3) has to be computed from the 79% figure given. All three pieces of information are highly relevant, and to evaluate the probability that you have the cancer you have to combine them in the right manner. Bayes’ method tells you how to do this. Here’s how.
To keep the arithmetic simple, let’s assume a total population of 10,000 people. Since all we are ultimately concerned about is percentages, this simplification will not affect the final answer. Let’s assume in addition that the various probabilities are reflected exactly in the actual numbers. Thus, of the total population of 10,000, 100 will have the cancer, 9,900 will not.
Bayes’ method is about improving an initial estimate after you have obtained new evidence. In the absence of the test, all you could say about the likelihood of you having the cancer is that there is a 1% chance that you do. Then you take the test, and it shows positive. How do you revise the probability that you have the cancer?
Well, there are 100 individuals in the population who do have the cancer, and for all of them the test will correctly give a positive prediction, thereby identifying 100 individuals as having the cancer.
Turning to the 9,900 cancer-free individuals, for 21% of them the test will incorrectly give a positive result, thereby identifying 9900 x 0.21 = 2079 individuals as having the cancer.
Thus, in all, the test identifies a total of 100 + 2079 = 2179 individuals as having the cancer. Having tested positive, you are among that group. (This is precisely what the test evidence tells you.) The question is, are you in the subgroup that really does have the cancer or is your test result a false positive?
Of the 2179 identified by the test, 100 really do have the cancer. Thus, the probability of you being among that group is 100/2179 = 0.046. In other words, there is a 4.6% possibility that you have the cancer.
The above computation shows why it is important to take account of the overall incidence of the cancer in the population—what is sometimes referred to as the base rate or the prior probability. In a population of 10,000, with a cancer having an incidence of 1%, a test with a reliability of 79% (i.e., 21% false positives) will produce 2,079 false positives. This far outweighs the number of actual cancer cases, which is 100. As a result, when your test result comes back positive, the chances are overwhelmingly that you are in the false positive group.
To avoid having to go through the same kind of reasoning every time, Bayes codified the method into a single formula—Bayes’ theorem. Let P(H) be the numerical probability that the hypothesis H is correct in the absence of any evidence—the prior probability. In the above example, H is the hypothesis that you have the cancer and P(H) is 0.01 (1%). You then take the test and obtain a positive outcome; this is the evidence E. Let P(H|E) be the probability that H is correct given the evidence E. This is the revised estimate you want to calculate. Let P(E|H) be the probability that E would be found if indeed H occurred. In the example, the test always detects cancer when it is present so (unusually) P(E|H) = 1 in this case. To compute the new estimate, you first have to calculate P(H-wrong), the probability that H does not occur, which is 0.99 in our example. And you have to calculate P(E|H-wrong), the probability that the evidence E would be found (i.e., the test comes out positive) even though H did not occur (i.e., you do not have the cancer), which is 0.21 in the example. Bayes’ theorem says that:
P(H|E) = P(H) x P(E|H)/[P(H) x P(E|H) + P(H-wrong) x P(E|H-wrong)]
Using the formula for our example:
P(H|E) = 0.01 x 1/[0.01 x 1 + 0.99 x 0.21] = 0.046
A quantity such as P(H|E) is known as a conditional probability—the conditional probability of H occurring, given the evidence E. Unscrupulous lawyers have been known to take advantage of the lack of mathematical sophistication among judges and juries by deliberately confusing the two conditional probabilities P(G|E), the probability that the defendant is guilty given the evidence, and P(E|G), the conditional probability that the evidence would be found assuming the defendant were guilty. Deliberate misuse of probabilities has been known to occur where scientific evidence such as DNA testing is involved, such as paternity suits and rape and murder cases. In such cases, prosecuting attorneys may provide the court with a figure for P(E), the probability that the evidence could be found among the general population, whereas the figure of relevance in deciding guilt is P(G|E). As Bayes’ formula shows, the two values can be very different, with P(G|E) generally much lower than P(E). Unless there is other evidence that puts the defendant into the group of possible suspects, such use of P(E) is highly suspect, and indeed should perhaps be prohibited. The reason is that, as with the cancer test example, it ignores the initial low prior probability that a person chosen at random is guilty of the crime in question.
Instructing the court in the proper use of Bayesian inference was the winning strategy used by American long-distance runner Mary Slaney’s lawyers when they succeeded in having her 1996 performance ban overturned. Slaney failed a routine test for performance-enhancing steroids at the 1996 Olympic Games, resulting in the US athletic authorities banning her from future competitions. Her lawyers demonstrated that the test did not take proper account of the prior probability and thus made a tacit initial assumption of guilt.
In addition to its use—or misuse—in court cases, Bayesian inference methods lie behind a number of new products on the market. For example, the paperclip advisor that pops up on the screen of users of Microsoft Office—the system monitors the user’s actions and uses Bayesian inference to predict likely future actions and provide appropriate advice accordingly. For another example, chemists can take advantage of a software system that uses Bayesian methods to improve the resolution of nuclear magnetic resonance (NMR) spectrum data. Chemists use such data to work out the molecular structure of substances they wish to analyze. The system uses Bayes’ formula to combine the new data from the NMR device with existing NMR data, a procedure that can improve the resolution of the data by several orders of magnitude.
Other recent uses of Bayesian inference are in the evaluation of new drugs and medical treatments, the analysis of human DNA to identify particular genes, and in analyzing police arrest data to see if any officers have been targeting one particular ethnic group.
Devlin’s Angle is updated at the beginning of each month.
Keith Devlin ( devlin@stmarys-ca.edu) is Dean of Science at Saint Mary’s College of California, in Moraga, California, and a Senior Researcher at Stanford University. His latest book InfoSense: Turning Information Into Knowledge, which shows how a mathematical approach to information can help us to understand information flow and manage it more efficiently, was published by W. H. Freeman last August.
MARCH 2000
Stealing Copernicus
The theft last month of a first edition copy of Nicolaus Copernicus’s classic text De revolutionibus orbium coelestium (“On the revolution of the heavenly spheres”) was the seventh such disappearance of this valuable work in recent years—a chain of thefts that has left police from the United States to the former Soviet Union doing what the missing texts say the planets do around the sun: going round in circles. For De revolutionibus, as it is more commonly known, was the book in which Copernicus first presented the heliocentric model of the solar system. It is arguably the first scientific text of the modern scientific era.
First published in 1543, there are 260 known copies still in existence from the first-edition printing of (it is thought) about 500, currently worth up to $400,000 each. All the major scientific figures who came after Copernicus owned copies of the book (though not necessarily the first edition), including Brahe, Kepler, Bacon, Descartes, Galileo, Newton, and Halley. In many cases they annotated their copy, thereby making it even more valuable on today’s rare-books market. (The original handwritten version is kept at the Jagiellonian University in Krakow, Poland, the university where Copernicus received his education. You can view the entire manuscript on their website: www.bj.uj.edu.pl/bjmanus/revol/titlpg_e.html )
The discovery in early February of the latest theft, from the Academy of Sciences Library in St. Petersburg, led Russian police to seek the assistance of Interpol in trying to recover it.
An earlier theft, in Kiev, Ukraine, occurred in 1998, when a thief used a fake police ID to gain access to the archives at the Ukrainian National Library. According to the librarians, the man first asked to examine six books, including De revolutionibus. Some time later, he returned the books, obtained a receipt, and left to take a break. When he came back, he requested more books, again including the Copernicus. Then, when he left the building just before closing time, he showed the guard the initial receipt to “prove” that he had returned the book.
Three months later, a man in his forties walked into the library of the Polish Academy of Sciences in Krakow, Poland, and asked to read a first-edition copy of De revolutionibusvalued at $320,000. A short while later he told the authorities he had to use the bathroom, from where he slipped out with the book, leaving behind only its cover.
Copies have also disappeared from the University of Illinois at Champaign-Urbana and the Mittag-Leffler Institute in Stockholm, Sweden. Two of the stolen copies were subsequently recovered when they surfaced on the international rare-book market, but five remain missing.
Recovery of any stolen copy offered for sale publicly is made virtually certain because of a detailed catalog of all known copies drawn up over a twenty-five year period by Owen Gingerich, a professor at the Harvard-Smithsonian Center for Astrophysics in Cambridge, Massachusetts. Commenting in the press following the recent theft, Gingerich says there is no evidence to suggest an international conspiracy to steal copies of the treatise.
Since Gingerich’s extensive catalog means that stolen copies cannot be offered for public sale, it is likely that the thefts are the work of isolated individuals who simply want to own a copy of one of the most pivotal publications in human history—the work that established our present-day understanding of the solar system as having the sun at the center, with the planets, including Earth, rotating around it. Thieves might be tempted by the fact that, although first-edition copies are of considerable value, with at least 260 of them in existence few are kept under great security, and many copies are easily accessible in the reference sections of public libraries.
Certainly, the thieves are not stealing the book because they are in search of a gripping read. For one thing, it’s in Latin, as were all scientific publications at the time. Second, for the most part the book consists of page after page of long and tedious numerical and geometric reasoning, interspersed with numerical table after numerical table, with only the 142 geometric illustrations to break the monotony. (It took Copernicus over thirty years to complete the book.) For what Copernicus does is work out all the arithmetic details of a model of the solar system consisting of the sun at the center and the five known planets—Mercury, Venus, Earth, Mars, and Satur—revolving around the sun in concentric circular orbits. Since one of his original goals in writing the book was to provide a more reliable ecclesiastical calendar, he also provides a number of tables that allow readers to predict the positions of planets and determine the dates of various religious observances.
Astronomy from Vodka
Although the sudden spate of thefts of De revolutionibus make for an intriguing news story, it is nothing compared to the story behind the original publication of the work, which has humor, drama, tragedy, a deathbed scene that Hollywood could not have scripted better, and all the intrigue, behind-the-scenes activity, excesses, politics, and hype we associate with the present-day publication of a sure-fire bestseller.
In fact, the story really begins back in ancient Greece. For, contrary to popular belief, Copernicus was by no means the first to suggest that the sun was the center of the solar system (more accurately, of the entire universe, which was then thought to consist of the five planets and the sun, all surrounded by an outer sphere of stars). The first person who seems to have seriously considered the possibility was Aristarchus of Samos. He rejected the hypothesis only on the grounds that to reconcile heliocentricity with the observed motions (or lack of motion) of some of the stars required that the stars are much further away (by a factor of forty or so) than was assumed at the time to be the case. (In fact, we now know that the nearest stars are much, much further away than the distance that misled Aristarchus.)
What Copernicus did, that no one had done previously, was work out the mathematical details of the heliocentric model, based on the astronomical data available at the time. This, of course, is exactly what Ptolemy had done for the geocentric model of the universe over a thousand years earlier, in the second century AD. There were two reasons why Copernicus’s work was accepted almost immediately, and Ptolemy’s abandoned. First, it was conceptually much simpler and (consequently) mathematically easier to deal with. In order to account for the observational data, Ptolemy had to assume the sun and the planets moved around the fixed earth on complicated orbits consisting of circles drawn on circles (so-called epicycles), leading to some complicated geometry and tricky arithmetic. In Copernicus’s model, the orbits of the planets around the sun are simply concentric circles, for which the arithmetic is easy. (Of course, as Kepler was subsequently to demonstrate, the orbits are not circles but ellipses — and even that is only approximately true.) The second advantage of the Copernican system over the Ptolemaic was that it was marginally more accurate. In particular, it was easier to locate Venus and Mercury accurately in the Copernicus model.
Copernicus was born in 1473 in Torun. When he was ten years old, his father, a prosperous merchant, died, and Nicolaus went to live with his maternal uncle, Bishop Lucas Watzenrode, the Lord of Warmia, a tiny Polish feudal holding of the Catholic Church. His early schooling was first in Torun, then Wloclawek, where he first encountered astronomy. The teacher who introduced Copernicus to the subject he was to revolutionize was a man named Vodka—a man who, according to an oft-repeated story, followed the common practice among scholars of adopting a Latinized version of their name by naming himself “Vodka Abstemius.”
In 1491, Copernicus began his university studies in Krakow. Although the principal objects of his study were intended to be canon law and the Latin and Greek classics, we know he spent some time learning mathematics and astronomy since he bought several books on the subjects, at least some of which are still extant, complete with marginal notes written in his own hand.
From Krakow, he went to Bologna in 1496, where again his growing interest in astronomy tended to push aside his studies of church law and the classics.
In 1500, Copernicus made an Easter pilgrimage to Rome, where, along with 200,000 fellow pilgrims, he received the blessing of Pope Alexander VI. The following year he made a brief trip back to Warmia to be appointed to the Chapter of Frombork Cathedral, returning to Italy to continue his studies in Padua and then in Ferrara, where he obtained the degree of Doctor of Canon Law in 1503. Degree in hand, he returned home to Warmia, to serve as canon of the Cathedral Chapter of Frombork. It was from the turret in which he lived that he made his astronomical observations, although he was far more a theoretical astronomer than an observer, and based his work on the heliocentric model of the universe mostly on the observations of others.
Don’t blame the messenger
Copernicus knew from the start that publication of his work would lead to trouble with the Roman Catholic church, and decided that his best strategy would be to allow his results to trickle out slowly. Thus, although De revolutionibus was starting to take shape, Copernicus did not seek to have it published. Instead, some time before 1514, he released a short summary of his work called Commentariolus, which began to circulate in handwritten form. It is known that one copy found its way into the hands of the famous Danish astronomer Tycho Brahe.
Eventually, rumors of Copernicus’s work reached Georg Rheticus, a young professor of mathematics at the Lutheran University of Wittenberg in Germany. Fascinated by what he heard, in 1539 Rheticus traveled to Poland to visit Copernicus. The two hit it off, and Rheticus ended up staying at Frombork for two years, during which time he persuaded Copernicus to let him study the virtually completed De Revolutionibus.
Rheticus was enthralled, and tried to persuade Copernicus to publish the work. When his efforts proved in vain, the young German wrote a brief summary, publishing it in 1540 under the title Narratio Prima (“First Report”). The little booklet was enthusiastically received, and, probably as a result, when Rheticus left Frombork in 1541, Copernicus allowed him to take a complete copy of De revolutionibus to arrange for its publication. Rheticus entrusted publication of the manuscript to Johann Petrius, in Nuremberg, one of the leading scientific publishers of the day.
Unfortunately, Rheticus left Wittenberg soon afterward to take up a professorship at Leipzig, and left oversight of the printing to a Lutheran theologian called Andreas Osiander—unfortunate because the latter saw fit to do some unauthorized tinkering with the manuscript. When Rheticus received the first copies of the printed book in April 1543, he saw that the title had been changed. Instead of De revolutionibus, which was Copernicus’s title, the printed version read De revolutionibus orbium coelestium.
In addition to the change in the title, someone had also inserted an anonymous introduction, that read in part:
Since the novelty of the hypotheses of this work has already been widely reported, I have no doubt that some learned men have taken serious offense because the book declares that the Earth moves; these men undoubtedly believe that the long established liberal arts should not be thrown into confusion. But if they examine the matter closely, they will find that the author of this work has done nothing blameworthy. For it is the duty of an astronomer to record celestial motions through careful observation. Then, turning to the causes of these motions he must conceive and devise hypotheses about them, since he cannot in any way attain to the true cause. … The present author has performed both these duties excellently. For these hypotheses need not be true nor even probable; if they provide a calculus consistent with the observations, that alone is sufficient. … Now when there are offered for the same motion different hypotheses, the astronomer will accept the one which is the easiest to grasp. … let no one expect anything certain from astronomy, which cannot furnish it, lest he accept as the truth ideas conceived for another purpose, and depart from this study a greater fool than when he entered it. Farewell.
Rheticus suspected (correctly, as it turned out) that Osiander had made the changes. He vowed that if he ever had concrete proof, he would beat Osiander up. In the event, Rheticus limited his response to striking out both the preface and the additional words in the title with a red pen in the two copies in his possession. No one knows what Copernicus himself thought of the changes, since the first he saw of the printed version of his magnum opus was when it was delivered to him on his deathbed. (Or so the story goes; publication certainly took place close to 24 May, 1543, the day of Copernicus’s death.)
In addition to the title change and the addition of an unauthorized preface, someone—presumably either Rheticus or Osiander—adorned the cover with an introduction every bit as hyped and flowery as might adorn a present-day bestseller:
Diligent reader, in this work, which has just been created and published, you have the motions of the fixed stars and planets, as these motions have been reconstituted on the basis of ancient as well as recent observations, and have moreover been embellished by new and marvelous hypothesis. You also have most convenient tables, from which you will be able to compute those motions with the utmost ease for any time whatever. Therefore, buy, read, and enjoy.
This breathless passage is followed by the slogan, taken from the entrance to the famed Plato’s Academy in ancient Greece: Let no one untrained in geometry enter here.
Although he had been persuaded to have his work published, Copernicus remained mindful of the likely reaction of the Roman Catholic church—his church. He prefaces the book—and but for the insertion of Osiander’s introductory remarks, this would have been the first words to greet the reader—with a letter written to him by his friend Nicholas Schoenberg, the Cardinal of Capua. In that letter, Schoenberg praises Copernicus’s prowess as an astronomer, and urges him to publish his important new theory:
I have also learned that you have written an exposition of this whole system of astronomy, and have computed the planetary motions and set them down in tables, to the greatest admiration of all. Therefore, with the utmost earnestness I entreat you, most learned sir, unless I inconvenience you, to communicate this discovery of yours to scholars.
Not content with this endorsement, Copernicus follows it up with a lengthy letter, addressed “TO HIS HOLINESS, POPE PAUL III, NICHOLAS COPERNICUS’ PREFACE TO HIS BOOKS ON THE REVOLUTIONS.” In this letter, Copernicus acknowledges that many readers are likely to be shocked by the new theory. He stresses that, mindful of such reactions, he delayed publication for many years. The final decision to publish, he says, came after much soul searching, and then only at the strong urging of Cardinal Schoenberg, Bishop Tiedemann Giese of Chelmo, and other prominent clerics and scholars.
He was, he continues, forced into the conclusions he reached by the overwhelming mass of the evidence, evidence that, as a scientist, he could not ignore, however unpalatable some might find those conclusions. He quotes from Plutarch to show that the ancient Greeks also considered the possibility that the sun remains still while the earth and the other planets revolve around it. And don’t forget, he adds, that my initial reason for engaging on this work in the first place was the need for a more accurate ecclesiastical calendar.
What Pope Paul’s reaction to Copernicus’s work was is not known. Certainly, there was no great furor over the heliocentric model until Galileo forced the issue a generation later by instigating a showdown with the church authorities. For the most part, scholars seemed to view De revolutionibus in much the way that Andreas Osiander suggested in his unauthorized preface: not so much as a theory of how things really were, but rather as a useful piece of mathematics that happened to be based on a particular hypothesis.
Or did they? Perhaps they just wanted to stay out of trouble in the event that a furor did erupt. After all, there was no shortage of scholars eager to read the book, and the first edition soon sold out, leading to the publication of a second edition in Basel in 1566. Would they have taken the trouble to study the book so closely, as many of them did, if they had viewed is as “purely hypothetical”?
Tracking the first editions
And so to the present day. Or rather to 1970, when Gingerich embarked on his quest to track down all remaining copies of the first edition—a literary detective story not without its lighter moments.
For instance, Gingerich found that Trinity College Cambridge possessed not one but three first edition copies of De revolutionibus. Did they really need three? you might ask. Apparently, Trinity College wondered the same thing at one time. Deciding that two copies easily met their needs, they contemplated selling off one copy at auction. But someone had the good sense to imagine how it would look to the rest of the world if England’s oldest and most wealthy college, the home of Sir Isaac Newton, no less, were to sully its hands selling off such a valuable piece of scientific history. And so Trinity College continues to own three first edition copies.
Another two copies that Gingerich traced were owned by Eton College in England, though quite why an elite, private preparatory school should want even one copy of such a work is not clear, unless the school authorities at the time felt that the future ruling class of His Majesty’s kingdom would somehow benefit from breathing in air infused by the vapors from such a revered masterpiece.
On one occasion, Gingerich recounts, he heard of a second-edition copy held, rather improbably, by a library in Liverpool, a predominantly poor, working class, industrial port in the north of England. He wrote to the library asking for details. The librarian obliged, adding: “I suppose it is the second edition you are particularly interested in for we do have two copies of the first!”
Another first edition turned up at the Victoria and Albert Museum on London, which had acquired it purely as an example of decorative art! At some time in its life, the volume in question had been given a highly decorative cover, modeled after a Grolier binding.
Some of the copies Gingerich located had suffered damage of one kind or another, water, damp, and, in not a few cases, being nibbled away by mice to make a nest. But many first-edition copies still exist, in excellent condition. Thus, the recent rash of thefts is unlikely to put this important historical document beyond the reach of the ordinary citizen. Still, as someone who sees scientific knowledge as being common property, it does strike me as a little obscene that a few individuals seek private and exclusive ownership of a scientific work that its creator published to make it available for all mankind.
Devlin’s Angle is updated at the beginning of each month.
Keith Devlin ( devlin@stmarys-ca.edu) is Dean of Science at Saint Mary’s College of California, in Moraga, California, and a Senior Researcher at Stanford University. His latest book InfoSense: Turning Information Into Knowledge, which shows how a mathematical approach to information can help us to understand information flow and manage it more efficiently, was published by W. H. Freeman last August.
APRIL 2000
The Law of Small Errors
As any author of a book or a long article will attest, no matter how carefully you proofread your work, the moment the published version lands in your hands and you self-admiringly flick through the pages, you’ll find an error. Within minutes of picking the work up, your elation at seeing the results of your labors for the first time will be dashed, when you spot that glaring error that not only managed to creep in at some stage, but somehow lay unnoticed through several careful readings. Lay unnoticed until now, that is. Only when it’s too late to correct does it leap out of the page. “Gotcha!” it cries gleefully.
It’s the Law of Small Errors: As the number of words in a manuscript tends to infinity, the probability of a significant error making it into the final version approaches 1. As with any self-respecting infinite sequence in an undergraduate calculus class, “approaching infinity” here means “a tractable bunch”—the 100,000 words of the average book is definitely big enough.
I’ve written 23 books and several dozen long research papers in my career, and the same thing has happened every time. So I suppose I should have been prepared for it to happen again, when the first copy of my new book The Maths Gene arrived on my desk earlier this week. (This is the British edition. The American translation—The Math Gene (no “s”)—won’t come out until August. Publishing is like that. Don’t ask why. As the theatrical producer kept remarking in the movie Shakespeare in Love, it’s a mystery.) But, experienced as I was, that first glaring slip still caught me off guard. As I was idly flicking through the book for the very first time, the following passage leapt out at me and grabbed me by the throat:
A classic example is the birthday problem, which asks how many people you need to have at a party so that there is a better-than-even chance that two of them will share the same birthday. Most people think the answer is 183, the smallest whole number larger than 365/2. In fact, you need just 23. The answer 183 is the correct answer to a very different question: How many people do you need to have at a party so that there is a better-than-even chance that one of them will share your birthday? If there is no restriction on which two people will share a birthday, it makes an enormous difference.
Now the birthday problem is one of my favorite mathematical examples. I must have written and talked about it dozens of times. I could probably give a short lecture on it in my sleep. So how come that above passage slipped into print? For it’s quite wrong. Not the 23 part. Surprising though that number is, you really do need just 23 people at the party to have a better than 0.5 probability that two people will share the same birthday. But the number of people you need to have present for there to be a better-than-evens chance of someone sharing your birthday is not 183, but the much larger 254. (Yes, really, 254, including yourself.)
How did such a howler find its way into the text and go unnoticed? Before I try to answer that, here are the relevant computations to establish those two answers of 23 and 254.
First, the coincidence of two birthdays. It turns out to be easier to compute the probability that no two people at the party have the same birthday, and then subtract the answer from 1 to obtain the probability that two people will share a birthday. For simplicity, let’s ignore leap years. Thus, there are 365 possible birthdays to consider.
Imagine the people entering the room one-by-one. When the second person enters the room, there are 364 possible days for her to have a birthday that differs from the first person. So the probability that she will have a different birthday from the first person is 364/365. When the third person enters, there are 363 possibilities of him having a birthday different from both of the first two, so the probability that all three will have different birthdays is 364/365 x 363/364. When the fourth person enters, the probability of all four having different birthdays is 364/365 x 363/364 x 362/365. Continuing in this way, when 23 people are in the room, the probability of all of them having different birthdays is
364/365 x 363/364 x 362/365 x . . . x 343/365.
This works out to be 0.492. (It is when you have 23 people that the above product first drops below 0.5.) Thus, the probability that at least two of the 23 have the same birthday is 1 – 0.492 = 0.508, better than even.
Now for the problem of the birthdays different from yours. Pick any person at the party. The probability of that person having a birthday different from you is 364/365. (Again, I’m ignoring leap years, for simplicity.) Thus, if there are n people at the party besides yourself, the probability that they all have a different birthday from you is (364/365)n. (Since we don’t have to worry whether their birthdays coincide or not, we don’t have to count down 364, 363, 362, etc. as we did last time.) The first value of n for which the number (364/365) n falls below 0.5 is n = 253. And that’s all there is to it.
That was all pretty easy. A far more difficult question is: How did that error find its way into my book? The answer is, I don’t really know. A manuscript for a trade book (i.e., a book which the publisher thinks will have general appeal) goes through several stages of editing before it goes to press. In addition to the author’s own editing, the commisioning editor for the publisher usually goes through it carefully, generally making changes on each page to improve the wording, and then the copy editor goes through it looking for any remaining grammatical or stylistic errors. Everyone involved tries to shorten and simplify the text to make it as accessible as possible, cutting out any redundant prose. Having written several trade books now, and gone through this process each time, I try to preempt as many editing changes as possible by doing my own editing before sending the manuscript off to the publisher, particularly as production schedules usually mean that the author has only a few days to check over the changes suggested by the two editors. In the case of my book Mathematics: The New Golden Age, I did have to hold up publication for a week or so when I discovered that a particularly keen copy editor, unfamiliar with mathematics, had replaced all my carefully crafted statements of mathematical results with more colloquial forms that, while I agree they read much more smoothly, were logically incorrect. In the case of The Maths Gene, however, I suspect that the original error with the birthday paradox was mine, as I cut down what had started as a rather lengthy discussion of the problem to just a couple of sentences. (I was faced with reducing a 120,000 word manuscript to the 100,000 words I had contracted for with the publisher. To be honest, rough as this process might seem at time, it beats coalmining as a way to make a living.) Looking back now, I see that an earlier version contained not only discussions of the numbers 23 and 254, but also the crucial passage:
The answer 183 is the correct answer to a very different question: How many different birthdays do you need to have represented at a party so that there is a better-than-even chance that one of them will be your birthday?
Somehow, in the heat of the editing, by the time I got to page 271 (the very last page of the main text), either I was too liberal with my red pen or else I marked up the cuts too scrappily for the typesetter to make sense of. In any event, the error snuck in. The really annoying thing is that I passed over this error on the two or three subsequent times I checked over the manuscript. The law of small errors had struck again. The only redeeming factor on this occasion is that the US edition of the book is not due out until August, so I can correct the error before it has an opportunity to confuse or mislead American readers.
Let me leave you with another probability question. This one is for anyone who sets out to write a book. What do you think is the probability that the published version will contain at least one error? I know what the answer is in my case. It’s unity.
Devlin’s Angle is updated at the beginning of each month.
Keith Devlin ( devlin@stmarys-ca.edu) is Dean of Science at Saint Mary’s College of California, in Moraga, California, and a Senior Researcher at Stanford University. His latest book The Maths Gene: Why Everybody Has It But Most People Don’t Use It (complete with error), is published in the UK this month by Weidenfeld and Nicolson. The American edition, The Math Gene: How Mathematical Ability Evolved and Why Numbers Are Like Gossip, will be published by Basic Books in August.
MAY 2000
Will the real continuous function please stand up?
What exactly is a continuous function? Here is a typical explanation taken from a university textbook (G. F. Simmons, Calculus with Analytic Geometry, McGraw-Hill, 1985):
In everyday speech, a ‘continuous’ process is one that proceeds without gaps of interruptions or sudden changes. Roughly speaking, a function y = f(x) is continuous if it displays similar behavior, that is, if a small change in x produces a small change in the corresponding value f(x).
As the author observes, this description is “rather loose and intuitive, and intended more to explain than to define.” He goes on to provide a “more rigorous, formal definition,” which I summarize as:
A function f is continuous at a number a if the following three conditions are satisfied:
1. f is defined on an open interval containing a.
2. f(x) tends to a limit as x tends to a.
3. That limit is equal to f(a).
To make this precise, we need to define the notion of a limit:
If a function f(x) is defined on an open interval containing a, except possibly at a itself, we say that f tends to a limit L as x tends to a, where L is a real number, if, for any epsilon > 0, there is a delta > 0 such that:
if 0 < |x – a| < delta, then |f(x) – L| < epsilon
With limits defined in this way, the resulting definition of a continuous function is known as the Cauchy-Weierstrass definition, after the two nineteenth century mathematicians who developed it. The definition forms the bedrock of modern real analysis and any standard “rigorous” treatment of calculus. As a result, it is the gateway through which all students must pass in order to enter those domains. But how many of us manage to pass through that gateway without considerable effort? Certainly, I did not, and neither has any of my students in twenty-five years of university mathematics teaching. Why is there so much difficulty in understanding this definition? Admittedly the logical structure of the definition is somewhat intricate. But it’s not that complicated. Most of us can handle a complicated definition provided we understand what that definition is trying to say. Thus, it seems likely that something else is going on to cause so much difficulty, something to do with what the definition means. But what, exactly?
Let’s start with the intuitive idea of continuity that we started out with, the idea of a function that has no gaps, interruptions, or sudden changes. This is essentially the conception Newton and Leibniz worked with. So too did Euler, who wrote of “a curve described by freely leading the hand.” Notice that this conception of continuity is fundamentally dynamic. Either we think of the curve as being drawn in a continuous (sic) fashion, or else we view the curve as already drawn and imagine what it is like to travel along it. This means that our mental conception has the following features:
1. The continuous function is formed by motion, which takes place over time.
2. The function has directionality.
3. The continuity arises from the motion.
4. The motion results in a static line with no gaps or jumps.
5. The static line has no directionality.
Aspects of this dynamic view are still present when we start to develop a more formal definition: we speak about the values f(x) approaching the value f(a) as x approaches a. The mental picture here is one of preserving closeness near a point.
Notice that the formal definition of a limit implicitly assumes that the real line is continuous (i.e., gapless, or a continuum). For, if it were not, then talk about x approaching a would not capture the conception we need. In this conception, a line or a continuum is a fundamental object in its own right. Points are simply locations on lines.
When we formulate the final Cauchy-Weierstrass definition, however, by making precise the notion of a limit, we abandon the dynamic view, based on the idea of a gapless real continuum, and replace it by an entirely static conception that speaks about the existence of real numbers having certain properties. The conception of a line that underlies this definition is that a line is a set of points. The points are now the fundamental objects, not the line. This, of course, is a highly abstract conception of a line that was only introduced in the late nineteenth century, and then only in response to difficulties encountered dealing with some pathological examples of functions.
When you think about it, that’s quite a major shift in conceptual model, from the highly natural and intuitive idea of motion (in time) along a continuum to a contrived statement about the existence of numbers, based on the highly artificial view of a line as being a set of points. When we (i.e., mathematics instructors) introduce our students to the “formal” definition of continuity, we are not, as we claim, making a loose, intuitive notion more formal and rigorous. Rather, we are changing the conception of continuity in almost every respect. No wonder our students don’t see how the formal definition captures their intuitions. It doesn’t. It attempts to replace their intuitive picture with something quite different.
Perhaps our students would have less trouble trying to understand the Cauchy-Weierstrass definition if we told them in advance that it was not a formalization of their intuitive conception—that the mathematician’s formal notion of a continuous function is in fact something quite different from the intuitive picture. Indeed, that might help. But if we are getting into the business of open disclosure, we had better go the whole way and point out that the new definition does not explicitly capture continuity at all. That famous—indeed, infamous—epsilon-delta statement that causes everyone so much trouble does not eliminate (all) the vagueness inherent in the intuitive notion of continuity. Indeed, it doesn’t address continuity at all. Rather, it simply formalizes the notion of “correspondingly” in the relation “correspondingly close.” In fact, the Cauchy-Weierstrass definition only manages to provide a definition of continuity of a function by assuming continuity of the real line!
It is perhaps worth mentioning, if only because some students may have come to terms with the idea that a line is a set of points, that in terms of that conception of a line—which is not something that someone or something can move along—the original, intuitive idea of continuity reduces simply to gaplessness. In short, however you approach it, the step from the intuitive notion of continuity to the formal, Cauchy-Weierstrass definition, involves a huge mental discontinuity.
NOTE: This article is based on the paper Embodied cognition as grounding for situatedness and context in mathematics education, by R. Nunez, L. Edwards, and J. Matos, Educational Studies in Mathematics 39 (1999), pp.45-65.
Devlin’s Angle is updated at the beginning of each month.
Keith Devlin ( devlin@stmarys-ca.edu) is Dean of Science at Saint Mary’s College of California, in Moraga, California, and a Senior Researcher at Stanford University. His latest book The Maths Gene: Why Everybody Has It But Most People Don’t Use It, has just been published in the UK by Weidenfeld and Nicolson. The American edition, The Math Gene: How Mathematical Ability Evolved and Why Numbers Are Like Gossip, will be published by Basic Books in August.
JUNE 2000
Lottery Mania
On Tuesday, May 9, a kind of frenzy hit the states of Massachusetts, Maryland, Georgia, Illinois, Michigan, New Jersey, and Virginia, as thousands of people flocked into shops and gas stations to buy tickets to the Big Game. No, it wasn’t football, baseball, or basketball. This Big Game is a conglomorated state lottery, open to all adults in any of those seven states. Drawings are held twice a week, and as with most lotteries, the game is designed so that there is a “big” winner every few weeks, perhaps of the order of $10 to $20 million. Unusually, however, for 18 straight drawings starting on March 7, no one won, so that by the time the sun broke on the morning of May 9, the accumulated jackpot had reached $325 million, a US record for a single lottery draw. As a result of a last minute buying frenzy on May 9, by the time came for the draw at 11:00 PM that evening, the actual jackpot had swollen to $363 million. There were reports of people buying up to $3,000 worth of the $1 entry tickets.
There were two winners, one from Illinois, the other from Michigan. They shared the prize equally: $181.5 million each. During the period from the March 7 drawing through the May 9 drawing, Big Game ticket sales in all participating states totaled more than $565 million. Illinois alone sold over $100 million worth of tickets over the life of the jackpot.
The lottery organization makes no secret of your chances of winning. According to their website, the odds against winning the jackpot are around 76 million to 1. The website doesn’t explain how they arrive at that figure, but it’s an easy calculation.
The game requires that you choose 5 different numbers, each between 1 and 50, plus a sixth number between 1 and 36. To win the grand prize, your five numbers have to agree with the five numbers selected by the lottery computer (the order does not matter) and your sixth number has to be the same as the sixth number chosen by the computer. Thus, your odds of winning are:
1 in [50x49x48x47x46] / [5x4x3x2] x 36
which works out at 1 in 76,275,360.
Presumably that sixth number is there—with a range of 1 to 36—so that the odds work out at around 70 million to 1, a figure that, given the expected number of entries, will guarantee a jackpot winner roughly every two to four weeks. In this way, the lottery organizers can keep the level of interest high.
The math is easy. The hard part it getting your mind around the answer. Just how can you get a sense of odds of such magnitude?
Roughly speaking, those odds are slightly longer than throwing heads on 26 successive tosses of a fair coin. (That unlikely event has odds of 1 in 67,108,864.) Given that most people would be reluctant to bet on getting 5 heads in a row, let alone 26, this comparison makes it clear that the psychology of lotteries has a logic all of its own.
Actually, when it comes to coin tossing, many people’s intuition leads them to an erroneous conclusion that, in a sense, goes against the lottery comparison. Seeing a run of 3 or 4 heads in a row, they believe that the odds of getting a tail are increased—that “it’s time for a tail to come up.” The longer the run of heads, they believe, the more likely it becomes that you’ll get a tail. Not so. This is known as the Gambler’s Fallacy. The coin has no memory. The odds of getting a head or a tail on any one throw remain exactly one-half no matter how many previous throws have resulted in a head. (Assuming the coin is a fair one.)
What other ways might we have of trying to appreciate the odds against winning the Big Game? Well, imagine laying standard playing cards end to end from New York to San Francisco. The underside of just one of those cards is marked. Start to drive across country, and at some point stop and pick up a card. If you’ve chosen the marked card, you win the jackpot. Chose any other card and you lose. How much would you be willing to pay to play this game? In terms of the odds, you’ve just played the Big Game.
Or imagine a standard NFL football field, which has a playing area measuring 100 yards by 55 yards. Somewhere in the field, a student has placed a single, small, common variety of ant that she has marked with a spot of yellow paint. You walk onto the field, blindfolded, and push a pin into the ground. If your pin pierces the marked ant, you win. Otherwise you lose. Want to give it a go? If you do, then in terms of the odds, you’ll be playing the Big Game.
Of course, there is a sure fire way to ensure you win the Big Game. For a stake of $76,275,360, you can buy tickets that cover all possible combinations of numbers, and one of them would be sure to win. With a jackpot of $325 million, this looks like a rock solid way to make a massive profit. There’s still a small risk of losing, however. If four or more other people also pick the winning combination, then you all share the prize equally, and you lose money.
The real problem with this approach, however—leaving aside the small fact that you need $76 million to start with—is the time it would take to buy the tickets. If you were to choose numbers at an average rate of one per second, taking no breaks, and were to work like this 24 hours a day, 7 days a week, all year round, it would take you 15 years to cover all possible combinations.
However you look at it, the odds against winning the Big Game jackpot are truly staggering. Does that mean that the best strategy is not to play at all? Oddly enough, the optimal strategy is to play, but to restrict your wager to an amount of money that is truly of no value to you.
For most adults in the United States, $1 really is the same as nothing. If they lose a dollar bill, they think nothing of it. Now, if you don’t enter the lottery, your chances of winning are absolutely zero. You will never win. Ever. If, on the other hand, you buy a single dollar ticket, you have a small, but non zero probability of winning. That’s not merely “slightly” better than having no chance at all; it’s in an entirely different category. (As any sophomore mathematics major knows—or should know—a small, nonzero epsilon is very different from setting epsilon equal to zero.)
Now take into account the fact that many people gain considerable enjoyment from the anticipation of waiting for the lottery draw—of imagining what it would be like to have all that money—and it’s not at all hard to understand why lotteries like the Big Game are so popular. Indeed, the excitement of playing may well be worth the price of a $1 ticket.
What’s that you say? Have I tried my hand at the Big Game? No. The truth is, I’ve never bought a state lottery ticket in my life. Yes, I know I just argued that the optimal strategy is to enter with a small stake. But that fact is, I just can’t get myself past those odds.
Devlin’s Angle is updated at the beginning of each month.
Keith Devlin ( devlin@stmarys-ca.edu) is Dean of Science at Saint Mary’s College of California, in Moraga, California, and a Senior Researcher at Stanford University. His latest book The Maths Gene: Why Everybody Has It But Most People Don’t Use It, was published in the UK last month by Weidenfeld and Nicolson. The American edition, The Math Gene: How Mathematical Ability Evolved and Why Numbers Are Like Gossip, will be published by Basic Books in August.
JULY-AUGUST 2000
How to sell soap
The studio boss looked at the fresh and eager faces seated around the long conference table at Plato Television Studios. Jabbing his long, fat cigar toward them to punch home each point, he began:
“You guys are the best series scriptwriters in town. I’ve invited you here today to offer you the chance of a lifetime: to work on the best television series the world has ever seen. Here’s the outline of what I have in mind.”
He glanced at the young woman sitting at the far end of the table by the slide projector. “Alice, can you give us the first slide, please?”
Alice dimmed the lights and switched on the projector. The studio boss continued in the half light:
“There are these two families, you see: the Points and the Lines. Basically, the series is about what these two families do, and the relationship between them. One of the great things about this idea is that I’ve set it up so the series is going to be really cheap to make. As you can see on the slide, we don’t need to hire any actors for the Points, because the Points have no parts.”
The studio boss used his cigar to gesture toward the words on the screen at the first sentence, which read:
Points have no parts.
“The idea,” he continued, “is to use digital special effects to represent the Points. Those computer graphics guys are cheap these days—every time the aerospace industry downsizes or a computer company is bought out by Microsoft, another thousand of them are thrown out of work and we can pick them up for a song.”
There was an audible sigh of relief around the room as each of the eleven scriptwriters realized how easily it could have been for them to be forced to change careers in mid-life, but for the fortunate accident that they had flunked Algebra 2 at high school and studied English Lit. instead.
“We’ll also save a ton of money on the Lines,” the studio boss beamed, obviously pleased at his ingenuity. “My idea is that a Line has no breadth. That means we don’t need trained actors. We can use more of those out-of-work schmucks from aerospace.” All eyes turned to the second item written on the slide, as the studio boss jabbed at it with his cigar. It read:
A Line has no breadth.
“Now, you guys are the best, so I’m going to give you a lot of freedom on this project,” the boss growled. All you have to do is stick to five guiding principles for the way I want the series to go. Next slide Alice.”
Everyone looked as the next slide appeared on the screen.
“I’ll give you each a copy of this,” said the studio boss, “but let me summarize the five points you can see.”
“Item one. You can have a Line that connects any two Points. That should be clear enough.”
“Number two. You can continue any Line as long as you want. No problem there, either. Hell, they do that in lots of long running series.”
“The next one might need a bit of explaining. As you can see, what is says is ‘Any Point can be the center of a Circle of any size.’ My idea is that each episode can center around a particular person in the Point family. That episode will concentrate on the circle of friends of that person. Some weeks, it might be a small circle, other weeks a really big one—since the circle will be made up of Points, it doesn’t make any difference to the budget.”
There was a chorus of appreciation around the table as the scriptwriters began to see the potential of the studio boss’s overall idea to keep the budget down.
“Next one: ‘All right angles are the same.’ Every series has to have an angle. You all learned that in Television 101. You also know that some angles are right for the intended audience, some are wrong. In my series, there’s just one right angle. All other angles are wrong, and anyone who tries to introduce one will be off the show faster than I can say ‘soap.’ Understood?”
The studio boss looked around, daring anyone to respond to his challenge. No one did. The thought of all those unemployed former aerospace workers was still fresh in their minds, and they did not want to join them in the line for unemployment benefit.
“I’m not entirely sure I need the fifth guideline,” said the studio boss, who was clearly enjoying showing off the genius of his new idea. “It might be superfluous, given the other four. But I put it down just to be sure. It’s a bit hard to follow—I’ll get Alice to work on the text. But what it boils down to in simple terms is this: If one of you sets it up so that two of the Lines are not supposed to meet, then no matter who else takes over the storyline in a later episode, those two Lines are still not going to meet. Ever! Capisce?”
Everyone nodded. The boss leant back in his chair and chewed on his cigar.
“That’s it. Any questions?”
There was silence for a moment, then a young woman half way along the table raised her hand. “I think it’s a great concept,” she began. “but I’ve got one question.”
“Fire away,” replied the studio boss.
“How long do you expect this to run? Thirteen weeks? Fifty-two? Or are you planning on something that goes on for years, like As the World Turns or General Hospital?”
The studio boss chuckled. “Little lady, I don’t think you’ve quite got the message as far as my overall concept is concerned.” He gestured toward the five guiding principles on the screen. “This idea is so insanely great, it has so much potential, it’s going to change the world. Believe me, once it catches on and we get enough sponsors, this baby is going to run for thousands of years. Or my name’s not Euclid.”
NOTE: The above is taken intact from Devlin’s new book The Math Gene: How Mathematical Ability Evolved and Why Numbers Are Like Gossip, to be published by Basic Books in August. The UK edition, The Maths Gene: Why Everybody Has It But Most People Don’t Use It, was published in the UK in April by Weidenfeld and Nicolson.
Devlin’s Angle is updated at the beginning of each month.
Keith Devlin ( devlin@stmarys-ca.edu) is Dean of Science at Saint Mary’s College of California, in Moraga, California, and a Senior Researcher at Stanford University.
SEPTEMBER 2000
The Strange Case of Emily X
Readers in their fifties and older will probably be able to identify the person I am calling “Emily X.” In the 1960s, Emily was a brilliant and vivacious mathematics student at the University of California at Berkeley. Both her parents were physicians at the nearby Oakland Children’s Hospital, and she had a younger brother, Todd, then still in high school. Her mathematical ability was so great that as a sophomore she was already taking graduate classes in addition to her undergraduate courses, and her professors had begun steering her towards graduate school. Her future seemed assured.
Then, in her junior year, Emily disappeared. There was no note, no message to her friends. Her parents and younger brother were devastated. Days turned into weeks and months, and eventually the newspapers dropped the story. A year went by, then two, then three, and still there was no clue as to her whereabouts. Most people who knew her feared the worst—that she had been abducted and murdered.
Then, almost five years to the day after she vanished, Emily reappeared. She simply walked into her former home, poured a glass of milk, made a peanut butter and jelly sandwich, switched on the television, and sat down to wait for her parents to come home from work.
She looked fit and healthy, she showed no signs of any physical abuse, and she was outwardly happy. But she had absolutely no recollection of having been away, or indeed of the preceding five years. She did not know that President Kennedy had been shot. She had never heard of The Beatles. She thought it was still 1962.
The only apparent change was one that only her former mathematics professors could spot. Emily’s mathematical ability had increased dramatically. Five years earlier, she had been a very promising sophomore. Now she was on a par with the members of Berkeley’s world-famous mathematics faculty.
How could this have happened? It soon became apparent that she had not attended any other university. Even if she had studied in a foreign country, as some commentators suggested, it could not account for her total inability to remember anything but the mathematics she had learned.
It took Emily about six months of intense effort to fill in the missing years. During this time, she resumed her studies. Or rather, she worked alongside her former teachers as they tackled major unsolved problems of mathematics.
Then, after she had “caught up” with the world, two major events occurred that again changed her life.
First, she woke up one morning and found she had apparently lost all her mathematical ability. She could still do arithmetic, and she could carry out the symbolic manipulations of elementary algebra as well as any bright high school student. But she could not follow even the simplest mathematical proof—it was as if she did not understand what a proof was.
Second, that night, she experienced the first of many mental flashbacks that would eventually allow her to piece together her experiences during what she called her “lost years.”
Thus far, everything is well documented. The doubts occur when Emily tells of her life during her five “lost years.” Her descriptions of life in “a cold place, with snow everywhere, and a sky of shining silver” (from her autobiography, My Lost Years) are so detailed, and so logically consistent, that it is hard to imagine that they are fabricated. Yet she describes a world most of us would dismiss as science fiction.
Though Emily eventually recovered sufficiently to live an outwardly normal life, she never married or formed any close personal relationships. This is particularly significant, given that her descriptions of her lost years are filled with highly detailed accounts of friendships, love affairs, weddings, and marriages. Only much later, when she had retrieved the more painful memories from her lost years, did it become clear what had led to this interest.
Some of the interpersonal relationships in Emily’s other world are straightforward enough, others seem strange. For instance, in one section she describes (in extreme detail) her “other world” friends Janet and Eric, a married couple, and how they fell in love with, and then married, a young man called Paul. Three-way marriages were apparently quite normal in Emily’s other world. In her own words [My Lost Years, p.193]:
“The law does not prevent marriages of any size, but financial concerns do. In a three-way marriage, for instance, the new partner has exactly the same rights as the first two partners. When Paul married Janet and Eric, he had the same rights as they did. It would have been just the same if Eric had married Paul first and then they had married Janet. Being the first in a marriage offers no legal advantages. Couples definitely give something up if they marry a third person. Of course, they gain a lot as well. But most decide not to.”
Elsewhere, Emily describes her own “empty marriage.” The term turns out to have a different meaning in Emily’s other world than it has in the all-too-familiar world of failed marriages. An “empty marriage,” Emily explains, was a marriage ceremony in which a person married a state-appointed individual called a “tolen.” Though legally valid, the ceremony does not affect either party’s legal status. It’s only purpose is to enable the individual to experience a marriage ceremony.
Many readers will remember Emily’s appearance on The Tonight Show with Johnny Carson. Carson recounts the interview in his own autobiography [Here’s Johnny, p.207]:
“What about divorce?” I asked at one point in the conversation.
“Oh yes, of course there’s divorce,” Emily replied.
“Maybe you were living in Nevada!” I quipped. She did not respond to my humor, so I asked another question: “Tell me, how do you go about getting a divorce in your world? Is it easy?”
“No, it’s not easy at all,” Emily replied. “Divorce is just a special kind of marriage. You have to find your spouse’s nullifier. Then you go through a normal marriage ceremony with the nullifier. Then you are no longer married.”
I remarked that this didn’t sound difficult at all, and joked that Elizabeth Taylor did this all the time, but Emily ignored my attempt at humor. She went on to explain the process step-by-step, as if she were telling a child how to boil an egg. “Everyone has a nullifier,” she said, “but you only have one of them, so the difficult thing is to find him. It can take a lot of time. Maybe you’ll never find him. That can happen. Of course, there are agencies that specialize in locating nullifiers . . .”
By now the producer was waving furiously at me to liven things up, and I had to prevent the interview from turning into a lecture. “I’ll bet they are expensive,” I broke in again.
Unfortunately, once again Emily did not pick up on my attempt to inject a bit of humor. She simply confirmed, in a very matter-of-fact way, that divorces were both expensive and difficult to arrange. The producer was getting desperate. I tried again:
“Then I guess you were not living in Nevada after all!”
This time it worked. She picked up on my intention. “No, I’m sure it wasn’t Nevada,” she replied with a laugh. It was a great laugh.
At that point, the show came to an end. Emily’s final laugh left a positive impression on the audience. She could have come across sounding like a freak, and she almost did, but in the end she didn’t. That last remark saved her. And the show.
Looking back, it was one of my favorite shows. Emily was a fascinating person. I don’t have any record of the rest of our conversation, but it was probably as long as I have spent with a guest after we had gone off the air.
But if Emily had not spent her lost years in Nevada, where had she been? Many dismiss Emily’s story as the delusions of a schizophrenic. But they cannot explain the fact that for five years not a single person saw her or heard from her, and that during that same period, she went—untutored by any human being—from being a bright mathematics undergraduate to a world class research mathematician. You can’t fake mathematics.
Or can you? For anyone who is unable to figure out the significance of the Emily X story, I offer my own explanation in my new book The Math Gene: How Mathematical Thinking Evolved and Why Numbers Are Like Gossip (published in the USA by Basic Books), from which this month’s column is extracted.
Devlin’s Angle is updated at the beginning of each month.
Keith Devlin ( devlin@stmarys-ca.edu) is Dean of Science at Saint Mary’s College of California, in Moraga, California, and a Senior Researcher at Stanford University.
OCTOBER 2000
Do software engineers need mathematics?
Software engineers often proclaim that they never use any of the mathematics they learned in college. Come to that, they say they don’t use much of the computer science they learned either. As a mathematician, I’ll leave it to my CS colleagues to respond to the latter allegation. (At least, I would if all my CS colleagues hadn’t left academia to work for an Internet startup!) As far as the use of mathematics is concerned, however, let me respond. In my view, those software engineers are dead right: They don’t use their college mathematics.
Having got that off my chest, let me go on to say that they are also dead wrong. They make use of their college mathematics education every day.
There’s no paradox here. It comes down—in Clintonesque fashion—to what you mean by that word “use.” One meaning is the one those software engineers will have encountered in their math classes. For example, having learned the rule for integration by parts in their calculus class, they were then given exercises and exam questions that required them to use that rule. This is the most familiar meaning attached to the word “use”, and it’s the one the engineers implicitly assume when they say they never use their college math. But it’s a meaning that is built on what I call the “filling a vessel” view of the way humans learn.
According to the “filling a vessel” view, education consists largely of pouring facts into our brains, and using what we have learned consists of pouring it back out. That is, dare I say it, a highly simplistic—and erroneous—view of education. But it’s one that the education establishment (which I’m in) fosters every time it offers a course and then measures the results by setting a largely regurgitative, three-hour, written exam.
In contrast, all the evidence from several decades of research both into the way the brain works and into the learning process—and there is masses of such evidence—says that the acquisition of facts and algorithmic procedures are merely surface manifestations of what goes on when people learn. (We know they are surface phenomena since we generally forget them soon after the last exam is over.) The real value of education is something else. Our brains are perhaps the world’s best examples of an adaptive system. When we subject the human brain to an extended educational experience, it undergoes permanent changes. In physical terms, those changes are the growth and strengthening of certain neural pathways. In functional and experiential terms, we acquire new knowledge and skills. The more repetitive the learning process, the stronger and longer lasting are those changes.
The effect of repetitive learning is nowhere more dramatic than in mathematics. Formal mathematics is at most 5,000 years or so old. That’s a mere blink of an eye in evolutionary time, and certainly not long enough for our brains to undergo any but the most minor changes. Thus, the mental processes we use to do mathematics must have been acquired and in use long before the Sumerians introduced abstract numbers some time between 8,000 and 5,000 years ago. The new twist required in order to do mathematics was to bring those capacities together and use them to reason not about the physical and social world for which they initially developed through natural selection, but rather a purely abstract world of the mind’s own creation.
The human brain finds it extremely hard to cope with a new level of abstraction. This is why it was well into the eighteenth century before mathematicians felt comfortable dealing with zero and with negative numbers, and why even today many people cannot accept the square root of minus-one as a genuine number.
But software engineering is all about abstraction. Every single concept, construct, and method is entirely abstract. Of course, it doesn’t feel that way to most software engineers. But that’s my point. The main benefit they got from the mathematics they learned in school and at university was the experience of rigorous reasoning with purely abstract objects and structures.
Moreover, mathematics was the only subject that gave them that experience. It’s not what was taught in the mathematics class that was important; it’s the fact that it was mathematical. In everyday life, familiarity breeds contempt. But when it comes to learning how to work in a highly abstract realm, familiarity breeds a sense of, well, familiarity—meaning that what once seemed abstract starts to feel concrete, and thus more manageable.
Though the payoff from learning (any) mathematics is greater for the computer professional than most other people, in today’s society the benefits affect everyone. For instance, a study carried out by the US Department of Education in 1997 (“the Reilly Report”) showed that students who complete a rigorous high school course in algebra or geometry do much better in terms of gaining entry to college or university, and perform much better once they are there, whatever they choose to study. In other words, completion of a rigorous course in mathematics—it is not even necessary that the student does well in such a course—appears to be an excellent means of sharpening the mind and developing mental skills that are of general benefit.
In my recent book The Math Gene: How Mathematical Thinking Evolved and Why Numbers Are Like Gossip I examine these ideas more fully, fleshing out just which mental capacities are required to do mathematics, and identifying possible survival advantages that led to those capacities being selected for in our ancestors. I also show how acquisition of the ability to cope with abstraction—to marshal mental capacities developed to handle concrete objects and real-world circumstances, and apply them to abstract entities—confers some benefits on everyone. As I have tried to indicate in this short essay, the benefits for the software engineer are far greater. Indeed, they are an essential prerequisite. That’s not usually given as the “official reason” for the obligatory “math requirements” for engineering students. But it is, I suggeest, the main reason why they are beneficial.
Devlin’s Angle is updated at the beginning of each month.
Keith Devlin ( devlin@stmarys-ca.edu) is Dean of Science at Saint Mary’s College of California, in Moraga, California, and a Senior Researcher at Stanford University.
NOVEMBER 2000
The perplexing mathematics of presidential elections
When Slobodan Milosevic claimed victory in the Yugoslavian elections last month, the opposition cried foul and the population took to the streets in protest, eventually forcing Milosevic to admit defeat and stand down. Whichever of Gore and Bush is able to claim victory in this month’s US Presidential election, it is unlikely that there will be similar cries that the process was unfair. After all, everyone knows—don’t we?—that, dubious campaign gifts, negative ads, and occasional dirty tricks notwithstanding, when it comes to the actual election process, the US electoral system is as fair as can be. One person one vote, with victory going to the candidate with the most votes. Who can possibly object to that?
Well, John McCain could, for one. And in theory at least, so could Ralph Nader. (More on both of them later.) So too could anyone who takes a look at the mathematics of voting. It’s not the idea of one person one vote that’s the problem, it’s that math that is used to turn those votes into a final decision. Ideally, that math should reflect the wishes of the electorate. But does it?
The answer usually comes as a surprise to most people. There are, in fact, several different ways to do the math, and they often lead to very different outcomes. That’s right: there’s a choice of how to do the math!
The electoral math used in the United States election process counts votes using a system known as plurality voting. In this system, also known as “first-past-the-post,” the candidate with the most votes is declared the winner. Now in an election where there are just two candidates, that system works just fine. It’s when there are three or more candidates that problems can arise. Plurality voting can result in the election of a candidate whom almost two-thirds of voters detest.
For instance, in 1998, in a three-party race, plurality voting resulted in the election of former wrestler Jesse Ventura as Governor of Minnesota, despite the fact that only 37% of the electors voted for him. The almost two-thirds of electors who voted Democrat or Republican had to come to terms with a governor that none of them wanted—or expected. Judging by the comments immediately after the election, the majority of Democrat and Republican voters were strongly opposed to Reform Party candidate Ventura moving into the Governor’s mansion. In which case, he won not because the majority of voters chose him, but because plurality voting effectively thwarted the will of the people. Had the voters been able to vote in such a way that, if their preferred candidate were not going to win, their preference between the remaining two could be counted, the outcome could have been quite different.
For instance, several countries, among them Australia, the Irish Republic, and Northern Ireland, use a system called single transferable vote. Introduced by Thomas Hare in England in the 1850s, this system takes account of the entire range of preferences each voter has for the candidates. All electors rank all the candidates in order of preference. When the votes are tallied, the candidates are first ranked based on the number of first-place votes each received. The candidate who comes out last is dropped from the list. This, of course, effectively “disenfranchises” all those voters who picked that candidate. So, their vote is automatically transferred to their second choice of candidate—which means that their vote still counts. Then the process is repeated: the candidates are ranked a second time, according to the new distribution of votes. Again, the candidate who comes out last is dropped from the list. With just three candidates, this leaves one candidate, who is declared the winner. In a contest with more than three candidates, the process is repeated one or more additional times until only one candidate remains, with that individual winning the election. Since each voter ranks all the candidates in order, this method ensures that at every stage, every voter’s preferences among the remaining candidates is taken into account.
An alternative system that avoids the kind of outcomes of the 1998 Minnesota Governor’s race is the Borda count, named after Jean-Charles de Borda, who devised it in 1781. Again, the idea is to try to take account of each voter’s overall preferences among all the candidates. As with the single transferable vote, in this system, when the poll takes place, each voter ranks all the candidates. If there are n candidates, then when the votes are tallied, the candidate receives n points for each first-place ranking, n-1 points for each second place ranking, n-2 points for each third place ranking, down to just 1 point for each last place ranking. The candidate with the greatest total number of points is then declared the winner.
Yet another system that avoids the Jesse Ventura phenomenon is approval voting. Here the philosophy is to try to ensure that the process does not lead to the election of someone whom the majority opposes. Each voter is allowed to vote for all those candidates of whom he or she approves, and the candidate who gets the most votes wins the election. This is the method used to elect the officers of both the American Mathematical Society and the Mathematical Association of America.
To see how these different systems can lead to very different results, let’s consider a hypothetical scenario in my home state of California, where Green Party candidate Ralph Nader is expected to do well. Suppose that, on November 7, 15 million Californians go to the polls, and that their preferences between the three main candidates are as follows:
6 million rank Bush first, then Nader, then Gore.
5 million rank Gore first, then Nader, then Bush.
4 million rank Nader first, then Gore, then Bush.
If the votes are tallied by the plurality vote—the present system—then Bush’s 6 million (first-place) votes make him the clear winner. And yet, 9 million voters (60% of the total) rank him dead last! That hardly seems fair.
What happens if the votes are counted by the single transferable vote system—the system used in Australia and Ireland? The first round of the tally process eliminates Nader, who is only ranked first by 4 million voters. Those 4 million voters all have Gore as their second choice, so in the second round of the tally process their votes are transferred to Gore. The result is that, in the second round, Bush gets 6 million first place votes while Gore gets 9 million. Thus, Gore wins by a whopping 9 million to 6 million margin.
But wait a minute. Looking at the original rankings, we see that 10 million voters prefer Nader to Gore—that’s 66% of the total vote. Can it really be fair for such a large majority of the electorate to have their preferences ignored so dramatically?
Thus, both the plurality vote and single transferable vote lead to results that run counter to the overwhelming desires of the electorate. What happens if we use the Borda count? Well, with this method, Bush gets
6m x 3 + 5m x 1 + 4m x 1 = 27m points,
Gore gets
6m x 1 + 5m x 3 + 4m x 2 = 29m points,
and Nader gets
6m x 2 + 5m x 2 + 4m x 3 = 34m points.
The result is a decisive win for Nader, with Gore coming in second and Bush trailing in third place.
What happens with approval voting? Well, as I have set up the problem so far, we don’t have enough information—we don’t know how many electors actively oppose each particular candidate. Let’s assume that the Gore supporters and the Nader supporters could live with the others’ candidate, but the voters in both groups really don’t want to see Bush in the White House. (This is not at all an unreasonable supposition, given the voting preferences we started with, but remember that this is a purely hypothetical example.) In this case, Nader gets 15 million votes, Gore gets 9 million votes, and Bush gets a mere 6 million. All in all, it’s beginning to look as though Nader is the one who should receive the Electoral College’s votes for California.
Faced with such confusion in how to count votes in elections with three or more candidates, it’s tempting to say that the only fair way to decide the issue is to choose the individual who would beat every other candidate in head-to-head, two-party contests. This approach was suggested by the Marquis de Condorcet in 1785, and as a result is known today as the Condorcet system.
For the scenario in our example, Nader also wins according to the Condorcet system. He gets at least 10 million votes in a straight Nader-Gore contest and at least 9 million votes in a Nader-Bush match-up, in either case a majority of the 15 million voters. Unfortunately, although it works for this example, and despite the fact that it has considerable appeal, the Condorcet method suffers from a major disadvantage: it does not always produce a clear winner!
For example, suppose the Californian voting profile were as follows:
5 million rank Bush first, then Gore, then Nader.
5 million rank Gore first, then Nader, then Bush.
5 million rank Nader first, then Bush, then Gore.
Then 10 million Californian voters prefer Bush to Gore, so Bush would easily win a Bush-Gore battle. Also, 10 million voters prefer Gore to Nader, so Gore would romp home in a Gore-Nader contest. The remaining two-party match-up would pit Bush against Nader. But when we look at the preferences, we see that 10 million people prefer Nader to Bush, so Nader comes out on top in that contest. In other words, there is no clear winner. Each candidate wins one of the three possible two-party battles!
So what do we do next? Faced with such a confusing state of affairs, the obvious thing is to abandon all of the methods we have looked at and search for an alternative approach. After all, there must be a fair way to count the votes in an election, mustn’t there?
Sadly—and surprisingly—the answer is no. In 1950, the Stanford economist Kenneth Arrow made a startling mathematical discovery—a discovery for which he was subsequently awarded the Nobel Prize in Economics. Suppose, said Arrow, that we want to find a way of tallying the votes in an election. What kinds of conditions must that tallying system satisfy in order for it to give a fair outcome? One obvious condition is that if every voter prefers candidate A over candidate B, then the final ranking produced by the tally system should place A above B. Another obvious requirement is that if the tally system puts candidate A above candidate B, then that ordering between A and B should remain the same if one or more voters changes their mind about some third candidate C.
All right, you say, so what? Why beat about the bush (not George, this time) stating—in the words of Basil Fawlty (John Cleese)—the bleeding obvious? Here’s why. Arrow proved that there is only one vote-tallying system that satisfies those two seemingly innocuous, and eminently desirable, requirements: One person is appointed as a dictator and he or she rules absolutely. And that’s as far away from the idea of democracy as you can get! In other words, if it’s democracy you want, there is no fair way to tally the votes in an election.
I should stress that Arrow’s theorem doesn’t just say that none of the tallying systems that have been devised so far is fair. Arrow proved that no fair system can possibly exist. Period.
Thus, the best we can hope for is to pick the best of a range of imperfect election tallying systems. But how do we make that choice? Things might not be so bad if mathematicians themselves agreed which system is best. Unfortunately, pretty well the only thing everyone does agree on is that the present system—plurality voting—is the worst, and any of the other systems described here would do a better job of representing the preferences of the electorate.
Lest I have given the impression that the single transferable vote and the Borda count are without their problems, let me rectify that misapprehension right away. One worrying problem with the single transferable vote is that if some voters increase their evaluation of a particular candidate and raise him or her in their rankings, the result can be—paradoxically—that the candidate actually does worse! For example, consider an election in which there are four candidates, A, B, C, D, and 21 electors. Initially, the electors rank the candidates like this:
7 voters rank: A B C D
6 voters rank: B A C D
5 voters rank: C B A D
3 voters rank: D C B A
In the first round of the tally, the candidate with the fewest first-place votes is eliminated, namely D. After D’s votes have been redistributed, the following ranking results:
7 voters rank: A B C
6 voters rank: B A C
5 + 3 = 8 voters rank: C B A
Then B is eliminated, leading to the new ranking:
7 + 6 = 13 voters rank: A C
8 voters rank: C A
Thus A wins the election.
Now suppose that the 3 voters who originally ranked the candidates D C B A change their mind about A, moving him from their last place choice to their first place: A D C B. These voters do not change their evaluation of the other three candidates, nor do any of the other voters change their rankings of any of the candidates. But when the votes are tallied this time, the end result is that B wins. (If you don’t believe this, just work through the tally process one round at a time. The first round eliminates D, the second round eliminates C, and the final result is that 10 voters prefer A to B and 11 voters prefer B to A.)
For all the advantages offered by the single transferable vote system, the fact that a candidate can actually harm her chances by increasing her voter appeal—to the point of losing an election that she would otherwise have won—leads some mathematicians to conclude that the method should not be used.
The Borda count has at least two weaknesses. First, it is easy for blocks of voters to manipulate the outcome. For example, suppose there are 3 candidates A, B, C and 5 electors, who initially rank the candidates:
3 voters rank: A B C
2 voters rank: B C A
The Borda count for this ranking is as follows:
A: 3×3 + 2×1 = 11
B: 3×2 + 2×3 = 12
C: 3×1 + 2×2 = 7
Thus, B wins. Suppose now that A’s supporters realize what is likely to happen and deliberately change their ranking from A B C to A C B. The Borda count then changes to:
A: 11; B: 9; C: 10.
This time, A wins. By putting B lower on their lists, A’s supporters are able to deprive him of the victory he would otherwise have had.
Of course, almost any method is subject to strategic voting by a sophisticated electorate, and Borda himself acknowledged that his system was particularly vulnerable, commenting: “My scheme is intended only for honest men.” Somewhat more worrying to the student of electoral math is the fact that the entry of an additional candidate into the race can dramatically alter the final rankings, even if that additional candidate has no chance of winning, and even if none of the voters changes their rankings of the original candidates. For example, suppose that there are 3 candidates, A, B, C, in an election with 7 voters. The voters rank the candidates as follows:
3 voters rank: C B A
2 voters rank: A C B
2 voters rank: B A C
The Borda count for this ranking is:
A: 13; B: 14; C: 15.
Thus, the candidates final ranking is C B A. Now candidate X enters the race, and the voters’ ranking becomes:
3 voters rank: C B A X
2 voters rank: A X C B
2 voters rank: B A X C
The new Borda count is:
A: 20; B: 19; C: 18; X: 13.
Thus, the entry of the losing candidate X into the race has completely reversed the ranking of A, B, and C, giving the result A B C X. With even seemingly “sophisticated” vote-tallying methods having such drawbacks, how are we to decide which is the best method? Of course, the democratic way to settle the matter would be to vote on the available systems. But then, how do we tally the votes of that election?
When it comes to elections, it seems that even the math used to count the votes is subject to debate!
Finally, what of the luckless John McCain, who dropped out of the presidential race after he lost the Californian Primary. Unlike the hypothetical examples we’ve looked at so far, in the case of the Californian Primary, we have real data to look at—the Sacramento Bee conduced an exit poll of voters. This showed that Californian voters would have voted 48 to 43 against Gore in a two-candidate presidential race, and would have voted 51 to 43 in favor of Gore in a two-candidate presidential battle against Bush. The newspaper did not ask voters how they would have voted in a McCain-Bush presidential race, since that was never on the cards, of course. However, the official polls showed that Republicans split 60-35 in favor of Bush, while the registered Democrats who voted Republican split 64-31 in favor of McCain. If we assume that the entire Democratic party would have split that way, then we conclude that in a McCain-Bush presidential race, the vote would go 50-45 in McCain’s favor. Based on these data, if the Californian votes had been tallied using the Borda count, then McCain would have got 48 + 50 = 98 points; Gore 43 + 51 = 94; and Bush 45 + 43 = 88. In other words, McCain would have won!
References: Alan D. Taylor, Mathematics and Politics: Strategy, Voting, Power and Proof, Springer-Verlag (1995). The McCain example was described by Dana Mackenzie in an article in the October issue of SIAM News.
Devlin’s Angle is updated at the beginning of each month.
Keith Devlin ( devlin@stmarys-ca.edu) is Dean of Science at Saint Mary’s College of California, in Moraga, California, and a Senior Researcher at Stanford University. His latest book is The Math Gene: How Mathematical Thinking Evolved and Why Numbers Are Like Gossip, published by Basic Books.
DECEMBER 2000
The Mathematics of Christmas
I guess it was an early sign that I was heading for a career in mathematics that, when I was a young child, the run-up to Christmas always presented me with a numerical puzzle. How could Santa Claus possibly visit all children at midnight on the same night? I never did get a satisfactory answer from my parents, whose stock response was “No one knows; he just does.”
These days, the adult me can address the question in a mathematically more sophisticated way. Just how big is the task facing Santa on Christmas Eve?
Let’s assume that Santa only visits those who are children in the eyes of the law, that is, those under the age of 18. There are roughly 2 billion such individuals in the world. However, Santa started his annual activities long before diversity and equal opportunity became issues, and as a result he doesn’t handle Muslim, Hindu, Jewish and Buddhist children. That reduces his workload significantly to a mere 15% of the total, namely 378 million. However, the crucial figure is not the number of children but the number of homes Santa has to visit. According to the most recent census data, the average size of a family in the world is 3.5 children per household. Thus, Santa has to visit 108,000,000 individual homes. (Of course, as everyone knows, Santa only visits good children, but we can surely assume that, on an average, at least one child of the 3.5 in each home meets that criterion.)
That’s quite a challenge. However, by traveling east to west, Santa can take advantage of the different time zones, and that gives him 24 hours. Santa can complete the job if he averages 1250 household visits per second. In other words, for each Christian household with at least one good child, Santa has 1/1250th of a second to park his sleigh, dismount, slide down the chimney, fill the stockings, distribute the remaining presents under the tree, consume the cookies and milk that have been left out for him, climb back up the chimney, get back onto the sleigh, and move on to the next house. To keep the math simple, let’s assume that these 108 million stops are evenly distributed around the earth. That means Santa is faced with a mean distance between households of around 0.75 miles, and the total distance Santa must travel is just over 75 million miles. Hence Santa’s sleigh must be moving at 650 miles per second—3,000 times the speed of sound. A typical reindeer can run at most 15 miles per hour. That’s quite a feat Santa performs each year.
What happens when we take into account the payload on the sleigh? Assuming that the average weight of presents Santa delivers to each child is 2 pounds, the sleigh is carrying 321,300 tons—and that’s not counting Santa himself, who, judging by all those familiar pictures, is no lightweight. On land, a reindeer can pull no more than 300 pounds. Of course, Santa’s reindeer can fly. (True, no known species of reindeer can fly. However, biologists estimate that there are some 300,000 species of living organisms yet to be classified, and while most of these are insects and germs, we cannot rule out flying reindeer.) Now, there is a dearth of reliable data on flying reindeer, but let’s assume that a good specimen can pull ten times as much as a normal reindeer. This means that Santa needs 214,200 reindeer. Thus, the total weight of this airborne transportation system is in excess of 350,000 tons, which is roughly four times the weight of the Queen Elizabeth.
Now, 350,000 tons traveling at 650 miles per second creates enormous air resistance, and this will heat the reindeer up in the same fashion as a spacecraft re-entering the earth’s atmosphere. The two reindeer in the lead pair will each absorb some 14.3 quintillion joules of energy per second. In the absence of a NASA-designed heat shield, this will cause them to burst into flames spontaneously, exposing the pair behind them. The result will be a rapid series of deafening sonic booms, as the entire reindeer team is vaporized within 4.26 thousandths of a second. Meanwhile, Santa himself will be subjected to centrifugal forces 17,500 times greater than gravity. That should do wonders for his waistline.
Christmas is indeed a magical time.
NOTE: The above column is adapted from a longer piece that has been circulating on the web over the past few years. I have no idea where it originated. If you’ve seen it before, my apologies. I thought it was worth giving a new outing.
Devlin’s Angle is updated at the beginning of each month.
Keith Devlin ( devlin@stmarys-ca.edu) is Dean of Science at Saint Mary’s College of California, in Moraga, California, and a Senior Researcher at Stanford University. His latest book is The Math Gene: How Mathematical Thinking Evolved and Why Numbers Are Like Gossip, published by Basic Books.