Flourish Embedded Figure
Introduction:
For this midterm project, I chose to analyze common themes in Charles Dickens’s public speeches, which range from the years 1841 to 1870. Instead of closely reading one specific speech, I used word frequency analysis to look for patterns across almost three decades of his speeches. My goal was to see what kinds of ideas he consistently emphasized when speaking in these civic and public settings.
Sources:
I first downloaded the unedited plain-text version (SpeechesOfCharlesDickens.txt) from the class drive that was downloaded from Project Gutenberg. The Project Gutenberg ebook was titled Speeches of Charles Dickens. The original file had a lot of additional information. This included information about editors and details of publications before and after the actual ebook, and then the ebook itself included lots of introductory information about Charles Dickens’s speeches. Since I was purely interested in the themes of Dickens’s speeches themselves, I got rid of all non-speech material so that only his spoken words would be analyzed.
After cleaning the file to be only the speeches, I uploaded the text into Voyant Tools. On top of the standard stop words that are automatically added by Voyant, I added custom stop words to remove common speech conventions like “gentlemen,” “ladies,” and “sir,” as well as some other structural words that again reflect a public address and not thematic content such as “shall,” “say,” or “place.” This helped focus on what I thought to be more meaningful, thematic vocabulary.
Processes:
After cleaning, I had a final version on Voyant that included the top relevant words from Dickens’s speeches. Instead of using the automatic word cloud created in Voyant, I decided to export the data to Excel. I selected the top 20 words (in order of frequency) for a clean visualization and then uploaded the .csv file into Flourish to make a horizontal bar chart. I chose a bar chart because it clearly shows differences in frequency and allows for easy comparison between terms. I wanted the format to emphasize analysis.
Presentation:
I embedded the visualization on my WordPress site that was created specifically for this project and set the newly created page as the homepage. The axes are labeled “Term” (independent) and “Frequency (Word Count)” (dependent) and a clear title and subtitle are used to concisely but effectively describe the chart.
Significance:
By analyzing the results, I was able to see that Dickens’s speeches consistently focus on ideas of public life and a sort of shared community. Some of the top words such as “institution,” “public,” “society,” “health,” “children,” “literature,” and “art” show that he cared about community involvement and the role of culture. The word “great” appears most frequently in the chart. Although it could be considered a rhetorical word (and therefore could be removed), I chose to keep it because of just how frequently it is used. It really highlights Dickens’s enthusiastic speaking style. He commonly placed the word “great” in front of another word to elevate its meaning, such as before certain causes or institutions.
By applying a digital method to analyze a set of historical speeches, this project shows how quantitative analysis can be used to point out patterns over certain time periods, whether long or short. Instead of looking at just one speech or a couple of speeches closely, we are able to see broad trends in how Dickens spoke about society. It shows how Digital Humanities methods can add another layer to understanding literary pieces, both historical and modern ones. At the same time, even with all the strengths of this data analysis, analysis with word frequency is an objective measure and does not capture tone or context, so it is important that it be used as guidance and in addition to close reading instead of as a total replacement.