Saturday, March 9, 2024

How long is a book chapter?

This blogpost will be an exercise in reviewing some ideas that I learned when teaching the history of mathematics. A basic challenge is the field is to research non-literate culture. Professors in the social science and humanities are very familiar with this issue and possible solutions: use linguistic data or archaeological data or. . . . How can we use archaeological data to study how a culture does mathematics? 

Alexander Thom used archaeological data in an intriguing way in his study of Neolithic England. England contains a number of megalithic structures believed to have been built during the Neolithic period (c. 4500 BC–1700 BC). The most famous of these is Stonehenge. Among his other claims, he argued that measurements of these structures show that people during the Neolithic period had a standard unit of length, which he termed the "megalithic yard." Moreover, he estimated this unit to be about 2.72 feet. 

The manner in which we derived this claim involves an interesting statistical idea. Thom spent decades measuring the lengths of various Megalithic structures. If Neolithic people used a standard unit, then most of these lengths should be integer multiples of a standard unit. Here I want to explain the methods Thom used.

To illustrate the statistical ideas, I want to look at a more familiar problem that involved similar statistical issues: do books have a standard chapter length? We could try to answer this question by looking at a large number of books and recording the length of each chapter. I do not want to do this. Instead, I want to answer this question using only knowledge of the page lengths of many books. This is similar to the problem facing Thom. He did not have a large number of measurements that he believed to be approximations to the "megalithic yard;" he only had measurements that he believed were integer multiples of the "megalithic yard."

The basic mathematical idea is simple but powerful. Let X_i = the page length of a randomly chosen book. The pages are divided into some number of chapters together with things like a title page, an author's biography, etc. If a chapter length was standardized to "q" pages, then we would have X_i = Y_i * q + E_i for Y_i equal to an integer and E_i an error term, a small random number. 

Now consider cos( 2 π X_i / r). If r=q, then we have cos (X_i / q) = cos( E_i/q) is approximately equal to "+1." If we sum cos( 2 π X_i / r) over a large number of books, then we will get a large number.

What if there isn't anything like a standard page length for a chapter? Then cos( 2 π X_i / r) should be a random number between -1 and +1, so if we sum over a large number of books, then we will get a number close to 0. A similar thing will happen if there is a standard length q, but r is a different, unrelated integer.

This suggests a strategy. Record the page lengths of a large number of books, compute the sum of cos( 2 π X_i / r) for various values of r, and then check if there are any values of "r" for which the sum is large. It is convenience to modify this idea slightly by dividing the sum by the square root of (number of books)/2. (This is to normalize things to be independent of the number of books chosen.)

I implemented this idea by looking up the page lengths of 257 books that appeared on New York Times Best Seller list or were read by Opera's Book Club. These books are a mix of popular fiction, literary fiction, and nonfiction, but all books were published by mainstream publishing houses. I did not make any effort to randomize my choices; I just looked up books on Wikipedia and Amazon.

What did this experiment produce? It produced the following chart:


The x-axis measures the different possible values of "r", a possible page length for a chapter. The y-axis measure the sum of cos( 2 π X_i / r). Thus most y-values should be close to zero; the other values should be large and correspond to a standard chapter length.

What do we see? Two standard chapter lengths clearly stand out: 16 pages and 8 pages. There is a natural explanation for why we see two different values: presumably, the 16 page chapters were printed with double spacing, while the 8 page chapter were printed with single spacing. 

Notice also that the values r=8 and r=16 stand out, but r=16 has a noticeably higher y-value than r=8. An explanation is that I happened to select more books that were with printed double spacing than single spacing.

How does this compare with the advice to aspiring writers? There are a lot of advice webpages, and many says that a typical chapter is approximately 1,000 to 5,000 words. This corresponds to 2 to 10 single spaced pages. Eight pages fits into this range, but our statistical analysis suggests that it very uncommon for writers to deviate from a 8 page single-spaced chapter.

This exercise is a nice illustration of the statistical ideas that Thom used. It is also indicates that we should be cautious in interpreting this type of statistical analysis. Thom argues that his data indicates that "Megalithic yard" functioned like a modern meter. Until the late 20th century, a meter was defined to be the length of a prototype meter bar kept by the International Bureau of Weights and Measures. 

A prototype meter bar
From the Science Museum Group

Thom argued that there was something like a Megalithic Bureau of Weights and Measures in England. There was a centralized body that determined and recorded standard measures as well as a specially educated class of people who used these standards to make measurements. This is a bold claim as archeologists do to believe that Megalithic societies in Western Europe maintained anything like this level of organization.

Our analysis of book chapters suggests an alternative to Thom's claim. Our analysis shows there is a very standard chapter length: 16 pages double spaced or 8 pages single spaced. But there certainly is not anything like an International Bureau of Publishing Measures, and certainly authors like Stephen King and William Faulkner do not write with a view to producing standard length chapters. Instead, these standard lengths were produced by a complicated, decentralized manner involving the commercial needs of publishers, the training of authors, and the desires of readers.

I am far from an expert on Megalithic culture, but as best I can tell, Thom's claims (which date to the 1960s) have become mired in controversy over statistical significance of his data. Also important, it seems to me, is exploring the different ways in which a culture can develop and maintain a standard system of measurements. The case of chapter lengths demonstrates that we should think beyond maintaining an International Bureau of Standards.

Update
Jordan Ellenberg pointed me to this webpage for an explanation of what is happening with book page counts. The fact that book page lengths tend to be multiples of 8 or 16 is an artifact of printing press technology, not editorial practices. As the webpage explains, books are printed by printing multiple pages to a sheet and then folding the sheets into pages. Pages are printed on both front and back of the sheet, and each time the sheet is folded, the number of pages is doubled. So a printer can naturally produce 2, 4, 8, 16, 32 or.... pages per sheet. The number of pages per sheet depends on size of the pages relative to a sheet, and it is most common to print 8 or 16 pages per sheet.

I took another look at the data I collected with this printer information in mind. The published page length appears to be the number of pages with text on it, rather than the number of pages. Every page has both a front and back, so the number of physical pages is always even, but roughly a quarter of the books I looked at have an odd number of pages. Presumably, someone involved in the publishing process tries to come up with enough printed text so that all of the physical pages get used, but often don't have enough material.

With this information in mind, an interested exercise for someone who wants to explore further would be to collect page counts for oversized art books and see if multiples of 4 show up. Another direction would be to look at eBooks.

In any case, I think I inadvertently demonstrated my point that interpreting this type of evidence is tricky. Statistical analysis clearly shows that book page counts tend to come in multiples of 8 and 16, but explaining why this is case requires going beyond the statistics and exploring how book publishing functions.

No comments:

Post a Comment

Dolemite in Indian Territory?!

  Rudy Ray Moore in Dolemite From RogerEbert.com Dolemite! The Human Tornado! Petey Wheatstraw, the Devil's Son-in-Law! The Avenging Dis...