Data schema, Phase 1

a brown corkboard littered with random photos, notes, and clippings held in place with colorful plastic pins, with red yarn strung between the pins connecting different articles. In the middle, a clipping says "seek for the truth."
Image by Freepik

When I was at ASECS 2024, I got a question after my presentation about how I was organizing my data. I was thinking he meant how I was setting everything up overall, for example what schema I was going to use, and I sort of only half answered him because I don’t entirely know what I’m doing for the overall project. I have realized, however, that I complicated the question far more than it required. I also realized that wasn’t something I’d talked about here, and it likely should have been. I feel very strongly that transparency in what and how and why is important — a belief that was strengthened after a really strong presentation from the great scholars behind the Women’s Print History Project about the need for DH documentation that discusses the “why” as well as the “how.”

As I’ve mentioned before, this is a project with multiple phases. Phase One involved using OCLC Worldcat and the ESTC to locate individual copies and possibly previously missed or spurious publication information. I started with the excellent list of Lennox publications in Charlotte Lennox: An Independent Mind by Susan Carlile, which gave me a huge headstart. That’s where the maps on this site came from — an early effort to identify locations and see where clusters of copies might have been located. The limitations of Google Maps, though, specifically in the number of points per map, made this of limited usefulness. I may yet go back and try to dump all of it into something like Tableau and get a master map, but I haven’t yet. Probably not until I’ve finished cleaning up the data set.

There was then an interim stage I consider Phase 1.5, in which I took the data out of Google Maps and put it into an Excel spreadsheet. The data as it stood was in the following categories:

  • ID #: this helped differentiate records so I could move them into a relational database eventually
  • Longitude: location info courtesy Google Maps
  • Latitude: location info courtesy Google Maps
  • Library Name: the name of the library with the holdings (preferably not just the special collections dept)
  • Description: intended to describe something about the book, such as edition # or suspected piracy, not the library
  • Affiliation: what institution is the library affiliated with, if any
  • Designation:(what sort of library/institution is it — this is the muddiest category because sometimes it refers to the library (unaffiliated) and sometimes to the institution (affiliated)
  • Pub Year: publication year
  • Location: where was the book published
  • Bookseller: who was the bookseller/publisher
  • Title: Title of the book/periodical

Now, as to why I chose to keep track of all this stuff, particularly about the institutions, part of it was because I thought it would be interesting to track this and see how the spread went. Part of it was because I thought it would help me track down funding to visit collections. And the last part of it, I think, is because I didn’t want to go back and add it in later in case I regretted not having it to begin with. It’s easier to cut info than to go back and add it in across all the records.

I’ve held to that philosophy as I’ve gone — I’d rather get all the information plus some and end up with something unnecessary than I would want to record less than I’d end up wanting. There is absolutely a point of diminishing returns, here, naturally, but it’s balanced by the realization that, for most of these copies, I will get only one bite at the apple. If I miss something or realize later what I needed, I may be able to get it by asking the librarians, but I likely won’t get a second visit to see it for myself. This is one reason that my local copies are my test cases, to ensure what information I’m recording and why before I start trying to build my photo album of all the special collections rooms I get to visit.

I hope this proves helpful to someone — I’m happy to provide more information or answer questions as needed. I’ll move toward describing Phase Two in an upcoming post.

State of the Project, March 2024

two puppies with party hats on in front of a tiny stuffed toy birthday cake
Photo by Sam Lion on Pexels.com

Greetings, Gentle Readers! Welcome to the Lennox Bibliography Project website’s second birthday! *blows noisemaker* I’m thrilled and amazed that I’m still working on this and that I’ve actually made progress, albeit more slowly than I would have liked. Still moving forward, though.

Again, this month has seen a ton of grading and not much in the way of progress. What it has also seen, however, is me prepping my presentation for the upcoming ASECS 2024 conference! Starting later next week (eep) I will be in Toronto, ON with a bunch of other 18th-century scholars and doing my part to further the (small-d) discourse in our field. For those in attendance who might be interested, I’ll be presenting on some of my findings thus far from the LBP.

Specifically, I’m on panel #112, Friday April 5th, from 4:30-6:00pm in the Elgin room. It’s the second of two panels from the Bibliographical Society of America, titled “Bibliography by the Numbers: Meta-Bibliography and the Study of Eighteenth-Century Book Culture.” My paper is titled (not terribly creatively) “The Travels of the Memoirs of the Duke of Sully.” Belatedly I realized I should have named it “So Many Copies: Charlotte Lennox and the Duke of Sully” but alas, inspiration came far too late. Feel free to stop by!

State of the Project, February 2024

Surly groundhog staring out from his/her burrow.
Photo by Niklas Jeromin on Pexels.com

Welcome back, Readers! I have returned from my research “vacation,” for certain values of “returned” anyway. I am still teaching the overload situation this semester, but it’s going well and I feel like I’m mostly not drowning at this point. I am only now resuming work on the project, though. It’s been a good time to take a break and I feel much more able to bring my attention back to it in a positive and constructive way.

One of the things this break brought home to me is how I’ve basically worked on this non-stop for two years at least, prior to the beginning of this past holiday season. It’s so easy to get caught up in the work, particularly when it’s something that we’re interested in and want to see happen, that we can fall victim to our academic training of “never stop working” and forget that sometimes it’s good to let the fields like fallow for a bit.

I’m not saying that everyone is always in a situation where that’s feasible, or necessarily even desirable — we all have our own relationships to our work and research. One of the things I’m discovering on my own career journey, though, is the inexorability of erosion. Time, familiarity, training, and anxiety all wear away at our boundaries, our self-images, our work-life balance, and our storehouses of resolve and empathy. It takes active work to shore up those borders and keep ourselves whole and healthy. Much like the mansions on the hilltops in California now being undermined by landslides and erosion, with the ground beneath them slipping away into the ocean, we’re similarly at risk of being washed out and ground down to nothing. The world, your institution, your research, your students — none of them will tell you to stop when you’re already spread thin.

I know I’m not saying anything really new here. The thing that really hit me, though, is that although I do want to push forward and I’m excited about where this project is going to go, I don’t have a clock on this or a deadline I have to make. I don’t have to push for some external timeline that’s only in my head and sacrifice myself and my enthusiasm for this project in the process. Pacing myself — taking breaks — is a good thing. So for those following along in the background, I appreciate your patience and hope you bear with me during quieter times. I should have more progress to relate soon.

State of the Project, Holiday Season 2023

Photo by Dana Tentis on Pexels.com

Happy holidays, gentle readers! I hope this season finds you reasonably well, at least on a local level, given the general setting of GAAAAAH globally at the moment. I myself have been digging deep into work and family and friendship as a sustaining focus for the past two months, hence the lack of distinct November update.

I had the good(?) news that I’m teaching both an extra class in the spring and a class online in the summer (assuming it fills), which on the one hand, yay! My car approves this decision. On the other hand, prep and grading and whatnot have largely eaten my ability to focus on Lennox work, so I’ve decided to let this lie fallow for the holiday season while I work on clearing my plate for the coming year. In January, therefore, I’ll be picking back up with the project work as normal. I hope to have new progress to update next time around.

For now, however, I hope that whatever you celebrate, you are able to take a moment to yourself and acknowledge the dark and light, the turning of the year and the brightness inherent in the human spirit even when things seem hopeless. May we all have a brighter 2024.

State of the Project, October 2023

a small lit jack-o-lantern next to a block calendar reading "31 October"
Photo by u0410u043bu0435u043au0441u0430u043du0434u0430u0440 u0426u0432u0435u0442u0430u043du043eu0432u0438u045b on Pexels.com

Greetings, gentle readers! And a happy decorative gourd season to all who celebrate. As we reach midterms in the fall semester and all the grading that entails, I have taken a break to update you on this month’s progress.

The process of confirming or eliminating physical copies continues apace. Since the last update, I’ve worked through another 37 libraries*, completing the libraries that begin with the letter “B”. It will likely not be a surprise to anyone to know that there are quite a few libraries that begin with the letter B that have Lennox holdings, and that’s after some elimination of spurious, online access, or deaccessioned copies.

As for the database work, I’ve about decided that I’ll put in the libraries I’ve confirmed as locations for now, but not the individual titles. I may change my mind about that as I move forward, but we’ll see. I go back and forth on the utility of it prior to actually seeing and cataloging the data for a given copy.

In other news, I’m also trying to decide if I want to apply to any fellowships for next summer’s research. I’d been hoping to be further along than I am (though it’s not like every letter will have as many entries). It seems a shame to let a summer go to waste, but I’ve still got so much cataloguing to do. I have a little while more to ponder and see if the pace picks up in this portion of the endeavor.

Number of libraries confirmed: 77
Number of libraries entered into the database: 2
Number of extant copies confirmed: 299 (See Note)

* Note: Those observant readers among you will have noted that the letter “B” also applies to “Bodleian,” as in the famed library (or set thereof) at the University of Oxford. I cannot claim to have completed verification of my list of items at the Bodleian, given the nature of its search engine, the fact that defaults to including all the Oxford libraries, of which there are many, and the fact that it’s by no means uncommon to have multiple copies of the same work at different libraries within the greater institution of Oxford. I started working on it and quickly realized I was just going to need to pull out all the Oxford listings, put them in a big list with a couple of notepads and sticky notes as needed, and just do all of them in one big push. As I’m not planning to be in the UK any time in the near future, this isn’t really a problem per se, but it does mean that all Oxford listings are considered not fully confirmed until I clime that particular molehill.

State of the Project, September 2023

Greetings, gentle readers! September finds me pushing forward still, albeit a bit more slowly due to general life issues and a lot of time dedicated to sorting through the works housed at the Beinecke Rare Book and Manuscript Library at Yale. Needless to say, we’re going to have to spend some time there in the future, to no one’s surprise.

I currently have 30 libraries verified, having removed two or three so far that ostensibly had only a couple of items at most, and they turned out to either simply not be there, to belong to an affiliated library on the same system, or to be online or microfom versions of the work. I am looking at applying to the Lewis Walpole Library Fellowship this year, along with perhaps a local fellowship that might cover some gas money for libraries near to hand.

Number of libraries confirmed: 34
Number of libraries entered into the database: 2
Number of extant copies confirmed: 148

State of the Project, August 2023

Photo by Luis Zheji on Pexels.com

Greetings, gentle readers! The end of Hot Data Summer is upon us, and I have nearly finished all my class prep for the next semester’s teaching. Around and among and before that, I’ve been busily embarking on Phase 2.2 of the project, which as stated in July’s update, involves breaking out the data by library/institution (it depends on the nature of the organization and its libraries — there’s a system, I promise) and verifying holdings via catalog searches and/or contacting the library directly in some cases.

Thus far I’ve completed a mere twenty libraries, but that’s still served to provide some interesting insights. I have eliminated some prospective holdings (either they don’t exist or were online access only), but I’ve uncovered at least as many that simply weren’t in ESTC or Worldcat when I used it, for whatever reason. I knew there would be missed volumes, so that isn’t that surprising, but the number and type of them is still intriguing. As an example, an early data point (we’ll see if it holds) is that out of those 20, six libraries have multiple pre-1850 editions of the Memoirs of the Duke of Sully. What does that mean? I’m not sure, but it’s something to ponder and look into further if it holds up.

As far as the database design goes, I’ve put it aside for the moment. I could, in theory, enter holdings in as I confirm them (almost certainly a good idea, now that I think about it) but I would like to get a bit more done in confirmation first, and then perhaps have phases of entering data as opposed to a more constant back and forth.

Number of libraries confirmed: 20
Number of libraries entered into the database: 2
Number of extant copies confirmed: 77

State of the Project, July 2023

Marbling from the end papers of a fantastic book at the Library Company of Philadelphia

Greetings, gentle readers. I am thrilled to share the news that I’ve finished Phase 2.1! I’ve finally completed* aggregating all the map location data into one huge Excel workbook. Now I can pull it all together into a single worksheet and create some pivot tables to help me cross reference locations. I have also added all the rest of the maps into the Maps page here on the site, for anyone who’s interested in seeing them. Just as a note, the maps are cleared of duplicates, but do not yet represent verified holdings.

The next step is Phase 2.2, wherein I put all the data in a single sheet, create a massive pivot table, and break out the results by library/institution so I can see which institutions have what books and start planning the in-person gathering of bibliographic data, as well as the applications to fund the travel required to visit those collections. I’m not sure how long that portion of the project will take, but I think it should be considerably less than the previous phase if only due to the relative lack of data entry.

I also need, while this process continues, to start finalizing some decisions about the Heurist database I’m using. I’ve started working on importing some data and creating the structure and relations based on the data I have. I’m still very much in the mess-around stage of making the database — there are no permanent decisions in place yet. I’m happy with how things are shaping up, though.

Finally, as a few data points:

  • The last title I put into the workbook was The History of Eliza.
  • Since June’s update, i’ve entered 472 entries across four titles, including the most famous of Lennox’s works, The Female Quixote (which had 309 entries).
  • 2612 records total at the end of Phase 2.1.

*I fully expect to find material that I’ve accidentally left out or overlooked. No process is perfect, after all.

State of the Project, June 2023

A green landscape, looking out from a shady wooded area into a sunlit yard, with a large statement tree off to the right side and a wooden fence in the distance.

The view from the back of my house

Greetings, gentle readers! The summer has almost returned, and I’m managing to make this post mid-month as opposed to nearly-done-month. Overall I’m quite pleased with my industry.

Insofar as the project goes, finishing the semester has done wonders for my ability to keep working on my data. I completed working on the Marquis de Sully finally and was able to likewise finish Old City Manners, Philander, Poems Upon Several Occasions*, and Shakespear Illustrated, the latter just this evening. I’m very happy with the rate of progress I’m making.

In addition to working through the data and cleaning it up, I’m currently trying to work through two different problems. The first thing I’m trying to sort out is regarding periodical reprints of Lennox’s work. I want to catalog not simply the stand-alone volumes of her works, but also the various reprints, both partial and complete, of her work in periodicals of the time. The problem is, how do I track them? Using the Lennox bibliography in Susan Carlile’s book, Charlotte Lennox: An Independent Mind, I have a list of excerpts in various publications.

The question, though is this: 1) is that actually all of them? and 2) (and this is a big one) how do I treat these periodicals within the same project as more traditional codex books? Lennox even had her own periodical, The Lady’s Museum. The same periodical may have (and in some cases does have) multiple samples from her various works across time. How do I record that data so that nothing is lost and yet I’m also not doubling my own effort? It’s not that this is a particularly complicated problem; it’s just that the solution I pick will necessarily inform the shape of the project as it goes, so I’d rather do my best to choose something that won’t cause problems later if I can.

The next issue, mostly unrelated to the above procedural quandary, is how to set up my database so that different records can have the same title without it being a gigantic mess. This is actually the easier one to answer, most likely, as I’m sure I’m not the first person building a relational data structure to have multiple entries with the same name, for example, but different data attached to each one. I’m working on doing some reading and I’ve got some feelers out with some data-oriented DBA people I know, and I’ll likely have an answer to this later this month. Once I do, I can keep working on the structure on my Heurist database and importing the material I’ve currently got in spreadsheets. In the meantime, I’ll keep working on the organization and getting the location data sorted, with the goal of being finished with it and organizing the next phase of the project by the fall, aka library fellowship application season.

Current Data Category: Shakespear Illustrated
# of entries in this category to date: 126
# of entries in the worksheet so far: 2140 and counting

State of the Project, May 2023

red tulips next to a stone against a field of brown mulch

Tulips from my garden before the deer ate them.

Here it is, the middle of May already. Where is the April post, you may ask? Well, the April post sadly went the way of the rest of my month of April, swallowed whole by the end of the semester and grading. I got nothing done on the project to speak of in April, though I did find my way into some interesting discoveries.

At the beginning of April/end of March, I attended the ASECS (American Society for Eighteenth Century Studies) annual conference, this year held in St. Louis. While I was there I went to a fantastic panel (okay, many fantastic panels) but this one in particular discussed a very interesting potential path forward for the Lennox project. This panel included a paper by Norbert Schürer (CSULB), who was discussing a digital humanities project being created using the Heurist platform — a customizable relational database system that was designed for Humanities research. The platform is free to use, hosted by the University of Sydney. It is based in MySQL, which means that it’s easy to export to somewhere else for hosting or other purposes, and it’s going to be simple to transfer to new homes and interfaces down the line. It can also generate a website interface and has mapping and network visualization capabilities.

No one else has, up to date, used the platform for a descriptive bibliography, so a lot of the relationships and information types I need for my project do not yet exist. Before I start putting in extensive book data, however, I want to take the information I do have and create a locational database that takes the map data sets I’ve created and pulls it together for more effective research planning. To that end, I’ve created a test database and been futzing around with it in my spare time, which has not been terribly plentiful over the past month but should ease up considerably over the summer.

I was torn for a time on how to proceed, as it might be less time consuming simply to switch over to inputting data into the database directly. I think I’ve decided, though, to continue putting entries into the spreadsheets for now while I try to figure out the structures I need in Heurist and build something useful. To that end, I’ve started inputting data again and am nearly done with the Marquis de Sully, which is a relief. I’ll keep you posted on how it all goes.

Current Data Category: Memoirs of the Duke de Sully translation
# of entries in this category to date: 953
# of entries in the worksheet so far: 1505 and counting