U.S. Copyright History 1923–1964

A little over a year ago, Greg Cram wrote about a pilot project we at NYPL were just beginning that aims to unlock the record of American creativity. At that point, he and our then-colleague Josh Hadro (now managing director of the IIIF Consortium) got the ball rolling, wrote a fair amount of initial documentation, and selected a vendor, DCL, to convert the first batch of volumes of the Catalog of Copyright Entries (CCE) from scanned images to parsed XML.

At the beginning of May, we completed the full run of the "Book" volumes of the CCE dating from 1923 to 1964. This gives us the best view, to date, of the number of books registered for copyright during this period in the U.S., as well as how many of these had their copyright renewed and extended.

Bar chart showing number of copyright registrations and renewals per year

See Chart Data below for raw numbers

The rough totals: 642,000 registered copyrights with 162,000, or 25%, renewed. Those renewed books are still in copyright today, and the copyrights on the other 480,000 books have expired and are (probably) in the Public Domain. These are initial results, so some important details and caveats follow.

Why Is This Important? Why 1923–1964?

In libraries, we like to have digital versions of books. We buy thousands of ebooks from publishers (which you can get for free with our SimplyE app); but we also have millions of older books in our research divisions that we would love to digitize and make more easily available to people doing research around the world.

During the term of copyright protection, the rights holder has the exclusive right to make and distribute copies of their book. When we want to make a digital copy at NYPL, we need either to have the rights holder's permission, or to rely on an exception or limitation in copyright law. You can go to Hathi Trust, or its non-library equivalent, Google Books, and search more books than you could ever read because indexing the content of an in-copyright book for search is considered fair use; however, presenting the full text may be an infringement.

For many years, the rule of thumb has been that any book published after 1923 was in-copyright (in the U.S.). It takes a bit of convoluted history to explain this date (but Sonny Bono appears at the end):

So, a book published in 1922 and renewed after its first 28-year term had a copyright that lasted until 1978 (1922 + 28 + 28). If a book was published in 1923 and renewed for a second term, it would have been just at the end of its copyright in 1978 when its term was extended to 75 years. It would have been about to enter public domain again in 1998, when this was extended to 95 years, through 2018. This is why January 1, 2019 was the first day that any books had entered the public domain in more than 20 years.

But what if the book wasn't renewed? After its first copyright term, a book published in 1923 became public domain in 1951. A book published in 1963 was subject to the same copyright law. If it wasn't renewed in 1990, it became public domain at the start of 1991.

For a long time, any book published before 1923 has surely been in the Public Domain and any book published after 1963 has positively been in copyright. Between those two dates though there is a more complex zone I'll call the Renewal Era. Of course, the lack of a renewal is not quite enough to say that something is no longer in copyright. As John Ockerbloom pointed out when I initially tweeted about these results, unrenewed books might "include previously published material still under copyright, or [have been] published abroad 1st & meet certain other URAA conditions."

Registrations and Renewals

Assuming, for simplicity's sake, that none of those considerations are relevant to a particular book, if it was published during the Renewal Era and not renewed, then it is in the public domain. To figure out if a book has been renewed, you turn to the Catalog of Copyright Entries, many dozens of thick volumes published every year until 1977 (after which the copyright records became electronic).

A large volume of the Catalog of Copyright Entries, several inches thick

The various volumes of the CCE contain registrations and renewals of every kind of copyrightable work including books, music, movies, artworks, and labels on commercial products. They are, in Greg Cram's words "one of the best records of American creativity." We're interested in all these things at the Library, but because books are relatively easy to digitize and use in digital form, we would like to know which ones are still in copyright and which aren't.

Since 2007, renewals have been in a searchable database at Stanford making it fairly simple to find books that have been renewed. Proving the negative, that something wasn't renewed, hasn't been as easy—typos, slight changes in titles, and other complications might cause a renewal search to fail. It has also been difficult to say what percentage of books have been renewed. Estimates, based on samples, have ranged from 7% to 33%.

While there is still plenty of work to do to clean up this data and understand some nuances of the entries, for the first time we have both ends of the copyright lifetime in a digital, ultimately searchable form for a full category of works, over a complete and continuous period of time. With the registrations now in digital form, not only do we have more information about the renewed books, we can also identify all those that do not have corresponding renewals.

What's in the Data

We are publishing the data in two repositories:

The bulk of the effort has been to convert book registrations from 1923 to 1964 into XML format. This includes Part 1, Group 1 (1923–1946), Part 1A (1947–1953), and Part 1 (1953–1964) of the CCE. In addition, we have created a new version of the renewals in tab-delimited format (the same information found in the Stanford database, but parsed differently to work more accurately with the registrations).

The renewal data contains both halves of Part 1 (Groups 1 and 2, Parts 1A and 1B) as well as their combined versions for 1950-1977, parsed from a transcription made by Project Gutenberg. For the years 1978 on, there are registrations for all classes taken from a version of the renewals exported from the Copyright Office database and hosted by Google.

Beginning with July 1953, the "Book" volume is Part 1, "Books and Pamphlets, Including Serials and Contributions to Periodicals." Prior to this, pamphlets, serials, and contributions to serials (and sermons, lectures, and many other things) were published separately as Group 2 or Part 1B, which are not included in this data yet. For the first half of 1953, there are about 8,200 entries from 3rd series, volume 7, part 1A, number 1; for the second half of the same year, there are more than 20,000 entries because 3rd series, volume 7, part 1, number 2 included everything that previously would have been published separately in part 1B.

Books and Not-Books

Every registration is assigned to a class as indicated by the letter prefix of its registration number: "A" for books, "B" for serials, "D" for dramas, etc. This nominally corresponds to the division into volumes so we would expect all the "D"s to be in the "Dramatic Compositions" volume (Part 1, Group 3, later Part 3). In practice this is not the case—Eugene O'Neill's A Moon for the Misbegotten, for instance, is included in Part 1, Group 1 (1952; DP1117) along with a few hundred other class "D" registrations.

We might wonder why DP1117 wasn't published in group 3 with the other "D"s or why, if it's more like a book somehow, it wasn't given an "A" number. It begs the question, though, are there any class "A" entries in Group 1 or Part 1A that someone might class as plays? I was able to find 100 entries that have "… a play in …" in the title, from Hilda; a play in four acts by Frances Guignard Gibbes (1923; A696442) to Seven devils from Magdala; a play in three acts.

Because of examples like this, I think it's fairly fruitless to try to determine what is a book or a "book proper" from the information in the CCE, so we have simply counted the contents of the volumes we have digitized. "A Moon for the Misbegotten" was renewed as were about 15% of the class 'D' entries in Group 1/Part 1A. The situation is worse with class "A" entries, where the not-very-well-held distinction between books (class "A") and non-books (classes such as "AA" and "A5") is partly erased after 1953. "AA" is done away with and presumably collapsed into "A". "A5" continued first as "B5" and then as "BB".

That said, inclusion in Group 1/Part 1A turns out to be a pretty good predictor of the kinds of things that tend to be renewed. If we look again at 1953, Part 1 Number 2, the second half of year with the two groups combined has 153% more entries than Part 1A Number 1 (20,811 vs. 8,217), but only 30% more of those are renewed (2,820 vs. 2,154). This implies something like a 5% renewal rate for Group 2/Part 1B entries. Many of those few renewed items may, in fact, be books. We recently learned that children's books, for instance, were routinely lumped in with "pamphlets."

Because of this change in the way the CCE was arranged, the count of renewals presented for 1953-63 must include some things that aren't "books". We also imagine some things that are "books" aren't counted for the years before 1953 because they are in Group 1/Part 1B, which we haven't converted yet. Also, because the count of unrenewed entries ("books" and "non-books") would be so much higher for 1953-63, I chose to estimate what would have been in part 1A if the 1A/1B distinction had continued. Non-renewed entries are estimated at 3.7 times the number of renewals. This is based on two generalizations: everything renewed is a book (close to true) and the 27% average renewal rate for 1946-1952 held for 1953-1964.

The only class of entries that has been excluded from the count are interim registrations (class "AI") since they would be an obvious source of undercounting or double counting, depending on how renewals are matched to registrations. Ultimately, what we really want to be able to do is count copyrights rather than entries by grouping interim (AI) and foreign (AF) registrations together with corresponding A entries as a single entity. A handful of entries in each volume is very complicated to parse and have also been ignored for now. These tend to be things like dozens of issues of Bell System technical bulletins and aren't particularly interesting for this analysis.

Further Work

Two obvious tasks lay before us: correcting the data and adding more data. Beyond that, I'm sure many people would like to see an online interface for exploring the entries. Linking the data both internally—entry to entry—and to external identifiers would make it really useful in the library world.

Correcting the Data

The XML files for the completed volumes of the CCE amount to 687 MB of data, all of which has been scanned, OCRed, keyed, and tagged so we expect a certain number of errors might occur at each step. We are focusing mostly on the accuracy of ID numbers so that registrations and renewals can be correctly paired; fortunately, there are things we can do to chase down many mistakes. For instance, within the new series or third series, registration numbers should be unique and duplicates can be investigated (the light printing of some pages make 0's, 3's, 6's and 8's especially difficult for OCR to distinguish). Frequently, the errors are typos in the CCE entries themselves.

Anyone who works with bibliographic data knows how difficult the many variations of authors' and publishers' names can be to deal with. Though the tagging of these fields is currently accurate enough to be very useful, this is probably the area most in need of correction. Even better would be to link authors and publishers to VIAF (Virtual International Authority File) and other identifiers.

We welcome correction from any source. If you think you have spotted an error, you can add an issue in the repository for registrations or renewals.

More Data

It is clear from the discussion above that, even if your interest is only books, the pre-1953 "pamphlet" volumes (Part 1 Group 2 and Part 1B) are still important. Beyond the books, the CCE covers every kind of creative endeavor and these volumes have a great deal of value as an historical record. Having a complete historical record, however, would mean converting not only the volumes for the years in which copyright is in question, but also the pre-1923 and post-1964 volumes. We are, at the moment, planning to do later volumes of Part 1, and would be happy to collaborate with anyone who wanted to take on any part of the CCE.

Linking Data

There are internal and external links that can be made. Links between registrations and renewals are explicit, but links between a registration and a previous interim registration, or to an original entry when new matter is being registered, are not always present.

Probably the most useful links would be between the registrations and equivalent records in other sources. Through 1937, the entries contain Library of Congress Control Numbers, which is a key to linking them to OCLC (Online Computer Library Center) records and Hathi Trust. It would be wonderful to have a way to make connections between these sources and entries from other years. Having an LCCN or OCLC number corresponding to a registration would make it easier to correctly link VIAF ids for authors and publishers, in order to make those searches more accurate.

Chart Data

Books are counted under the year of their registration rather than publication in the CCE. That is, a book with a 1950 registration date may be published in the 1950 volume of the CCE, but there is a good chance it appears in the 1951 volume, a smaller chance in the 1952 volume, and so on. Therefore, these numbers will not match the entry counts given in each printed volume since those are counts by publication rather than registration year.

Year # Renewed # Not Renewed # Not Renewed (estimated) Total Percentage Renewed
1923 1593 7198   8791 18.12%
1924 1633 7819   9452 17.28%
1925 1796 8869   10665 16.84%
1926 1955 9436   11391 17.16%
1927 2185 10413   12598 17.34%
1928 2384 11822   14206 16.78%
1929 2697 11161   13858 19.46%
1930 2559 11844   14403 17.77%
1931 2726 10761   13487 20.21%
1932 2677 9880   12557 21.32%
1933 2495 8925   11420 21.85%
1934 2666 9454   12120 22.00%
1935 2875 9691   12566 22.88%
1936 2989 9939   12928 23.12%
1937 3201 9674   12875 24.86%
1938 3242 10020   13262 24.45%
1939 3109 8990   12099 25.70%
1940 3374 9068   12442 27.12%
1941 3451 7353   10804 31.94%
1942 3229 5896   9125 35.39%
1943 2814 5198   8012 35.12%
1944 2585 4868   7453 34.68%
1945 2444 5971   8415 29.04%
1946 2954 8751   11705 25.24%
1947 3583 9788   13371 26.80%
1948 3544 8901   12445 28.48%
1949 3568 9930   13498 26.43%
1950 4257 11122   15379 27.68%
1951 4255 11167   15422 27.59%
1952 4138 11920   16058 25.77%
1953 5160   13951 19111 27.00%
1954 5915   15992 21907 27.00%
1955 5984   16179 22163 27.00%
1956 5925   16019 21944 27.00%
1957 6731   18199 24930 27.00%
1958 6787   18350 25137 27.00%
1959 7256   19618 26874 27.00%
1960 7420   20061 27481 27.00%
1961 7503   20286 27789 27.00%
1962 8017   21676 29693 27.00%
1963 8740   23630 32370 27.00%
Total 162416     642206 25.29%

Comments

Patron-generated content represents the views and interpretations of the patron, not necessarily those of The New York Public Library. For more information see NYPL's Website Terms and Conditions.

A Treasury of Russian Verse, ed. Avrahm Yarmolinsky, (New York:

Is this book still copyrighted and/or has its copyright been renewed? Is it in the public domain? How could I search your database to determine its status? Thank you.

Apparently it was renewed. Stanford dataset renewal info below.

ID: RE467510, DATE: 1990, TITLE: An Anthology of Russian verse, 1812-1960., AUTHOR: Avraham Yarmolinsky., OREG: A597346, DREG: 23Feb90, ODAT: 19Jan62, CLNA: Adam Yarmolinsky (C), OCLS: C, LINM: NM: revisions & additions.,

Book in Public domain

Is Will Cuppy's book The Decline and Fall of Practically everybody in the public domain yet? OIt eas last published in 1950.

Japanese Flower Arrangement Notebook, by Patricia Kroh

I checked the Stanford Copyright Renewal database, and didn't see a record for this. Has this book's copyright ever been renewed? It was published in 1962. Library of Congress Catalog Card Number 62-7396. Thanks.

I’d like to know if book

Was copyright renewed?

obtaining permission

Using the Stanford database, I have found the book I want to use as first copyrighted May 4, 1933 and renewed April 26, 1961. Both owners of the copyright (the author and his brother) are deceased. It was reprinted in 1993 with a copyright symbol (by George Marshall, the author's brother). How do I find out from whom permission for use of an excerpt should be requested? The book is Robert Marshall's Arctic Village, renewal Id R274975 and it was renewed April 26, 1961. Thank you

Searchable

May I ask when exactly this file will be searchable by the general public, or is that not yet on the agenda?