This blogpost will be an exercise in reviewing some ideas that I learned when teaching the history of mathematics. A basic challenge is the field is to research non-literate culture. Professors in the social science and humanities are very familiar with this issue and possible solutions: use linguistic data or archaeological data or. . . . How can we use archaeological data to study how a culture does mathematics?
Alexander Thom used archaeological data in an intriguing way in his study of Neolithic England. England contains a number of megalithic structures believed to have been built during the Neolithic period (c. 4500 BC–1700 BC). The most famous of these is Stonehenge. Among his other claims, he argued that measurements of these structures show that people during the Neolithic period had a standard unit of length, which he termed the "megalithic yard." Moreover, he estimated this unit to be about 2.72 feet.
The manner in which we derived this claim involves an interesting statistical idea. Thom spent decades measuring the lengths of various Megalithic structures. If Neolithic people used a standard unit, then most of these lengths should be integer multiples of a standard unit. Here I want to explain the methods Thom used.
To illustrate the statistical ideas, I want to look at a more familiar problem that involved similar statistical issues: do books have a standard chapter length? We could try to answer this question by looking at a large number of books and recording the length of each chapter. I do not want to do this. Instead, I want to answer this question using only knowledge of the page lengths of many books. This is similar to the problem facing Thom. He did not have a large number of measurements that he believed to be approximations to the "megalithic yard;" he only had measurements that he believed were integer multiples of the "megalithic yard."
The basic mathematical idea is simple but powerful. Let X_i = the page length of a randomly chosen book. The pages are divided into some number of chapters together with things like a title page, an author's biography, etc. If a chapter length was standardized to "q" pages, then we would have X_i = Y_i * q + E_i for Y_i equal to an integer and E_i an error term, a small random number.
Now consider cos( 2 π X_i / r). If r=q, then we have cos (X_i / q) = cos( E_i/q) is approximately equal to "+1." If we sum cos( 2 π X_i / r) over a large number of books, then we will get a large number.
What if there isn't anything like a standard page length for a chapter? Then cos( 2 π X_i / r) should be a random number between -1 and +1, so if we sum over a large number of books, then we will get a number close to 0. A similar thing will happen if there is a standard length q, but r is a different, unrelated integer.
This suggests a strategy. Record the page lengths of a large number of books, compute the sum of cos( 2 π X_i / r) for various values of r, and then check if there are any values of "r" for which the sum is large. It is convenience to modify this idea slightly by dividing the sum by the square root of (number of books)/2. (This is to normalize things to be independent of the number of books chosen.)
I implemented this idea by looking up the page lengths of 257 books that appeared on New York Times Best Seller list or were read by Opera's Book Club. These books are a mix of popular fiction, literary fiction, and nonfiction, but all books were published by mainstream publishing houses. I did not make any effort to randomize my choices; I just looked up books on Wikipedia and Amazon.
What did this experiment produce? It produced the following chart:
The x-axis measures the different possible values of "r", a possible page length for a chapter. The y-axis measure the sum of cos( 2 π X_i / r). Thus most y-values should be close to zero; the other values should be large and correspond to a standard chapter length.
What do we see? Two standard chapter lengths clearly stand out: 16 pages and 8 pages. There is a natural explanation for why we see two different values: presumably, the 16 page chapters were printed with double spacing, while the 8 page chapter were printed with single spacing.
Notice also that the values r=8 and r=16 stand out, but r=16 has a noticeably higher y-value than r=8. An explanation is that I happened to select more books that were with printed double spacing than single spacing.
How does this compare with the advice to aspiring writers? There are a lot of advice webpages, and many says that a typical chapter is approximately 1,000 to 5,000 words. This corresponds to 2 to 10 single spaced pages. Eight pages fits into this range, but our statistical analysis suggests that it very uncommon for writers to deviate from a 8 page single-spaced chapter.
This exercise is a nice illustration of the statistical ideas that Thom used. It is also indicates that we should be cautious in interpreting this type of statistical analysis. Thom argues that his data indicates that "Megalithic yard" functioned like a modern meter. Until the late 20th century, a meter was defined to be the length of a prototype meter bar kept by the International Bureau of Weights and Measures.
A prototype meter bar From the Science Museum Group |
No comments:
Post a Comment