(This post is part of a two-post series—I ended up having too much to say about the two poems I looked at with VADER and Pattern, so I split it up. Second half can be found here!)
Quincy Troupe’s “Come Sing a Song”—the 11-line poem that opens his 1972 collection Embryo Poems, 1967-1971—welcomes the reader with a series of invitations that are also requests. Apostrophizing in the imperative, the speaker begins with an appeal (“Come sing a song, Black Man”) and goes on to make similar appeals in almost every subsequent line. For example, the final three lines of the poem:
sing jazz, rock, or, R & B,
sing a song Black Man,
sing a “bad” freedom song
A first reading of this poem might see it as an invitation for black life to be newly acknowledged, recognized, and celebrated. More specifically, the speaker grounds this “singing” in black history, particularly black music, asking elsewhere in the poem that “a blues,” “a blackblues song,” and “a work song” be sung. In this sense, we might see in this poem a desire that this recognition and celebration of black life be sung by black voices with an ear for black audiences.
In the context of the Black Arts Movement, this reading makes intuitive sense: many of Troupe’s contemporaries regularly invoked and entered into dialogue with black music (listen, for example, to Jayne Cortez’s performance of her poem “How Long Has Trane Been Gone”; or to Sonia Sanchez as she discusses her first public reading of “a/coltrane/poem”). And many writers of the BAM sought explicitly to make art that spoke first and foremost to black communities—see, as two examples among many, Addison Gayle, Jr.’s extensive introduction to the edited collection The Black Aesthetic (1972), in which he argues that “today’s black artist … has given up the futile practice of speaking to whites, and has begun to speak to his brothers” (xxi). Or Haki Madhubuti’s (then Don L. Lee) 1968 article in Black World / Negro Digest, where he claims that “Black poets write out of a concept of art for people’s sake and not art for art’s sake. … The black poet is writing to black people and not to whites” (28).
This reading is also in keeping with the general scholarly consensus on Troupe’s work. The first sentence of his entry in the academic reference series Black Literature Criticism describes Troupe as “an acclaimed African American author whose jazz-inflected poems explore political and personal themes and celebrate the contributions of black artists, writers, musicians, and athletes” (310). With all this in mind, it makes sense that, in opening the collection, Troupe’s “Come Sing a Song” feels almost like a kind of invocation, asking black voices to sing songs celebrating black life.
PatternAnalyzer, the default sentiment implementation in TextBlob (which makes use of the Pattern sentiment classifier), considers “Come Sing a Song” to be the single most negative poem in the entire corpus. As one might gather from my reading above, I disagree strongly with Pattern’s judgment in this case. In a corpus of poetry containing direct attacks, extreme invective, and explicit takedowns of individuals, groups, and institutions, I did not find this poem to contain an exceptional amount of negative sentiment. On the contrary, I found “Come Sing a Song” to be positive and celebratory.
So: Pattern’s and my reading of this poem do not stack up. This program, with features designed to evaluate sentiment in text, is, to my mind, clearly missing something with regards to evaluating the sentiment in “Come Sing a Song.” That said, however wrong I find Pattern’s understanding of this poem to be, I don’t find this wrongness to be particularly bizarre or bewildering. Pattern is, after all, just following instructions—making programmatic decisions about how much positive or negative sentiment is in a given snippet of text according to rules given to it by humans. As they say in their article, the humans that built Pattern intended it to be “a Python package” that “provides general cross-domain functionality” across “web mining, natural language processing, machine learning and network analysis, with a focus on ease-of-use,” a goal at which it is very successful. They did not intend Pattern to be a thoughtful or savvy reader of modern American poetry. Knowing this, how are we to make sense of Pattern’s analysis of Quincy Troupe’s poem?
Using Pattern and VADER to Read BAM Poetry
In this post, I hope to make sense of disagreements like this—between a sentiment classifier like Pattern and a trained human reader like myself—with an eye for the larger use context of sentiment analysis, and my use of it in the study of poetry. I’ll do so by talking about two sentiment classifiers—Pattern (via TextBlob) and VADER—and some initial results of using these programs to analyze my corpus of texts. Because of the scale of my analysis—26 books of poetry—these results are, for the most part, exploratory. I haven’t used these tools to try to make any general claims about this incredibly diverse body of poetry. Rather, my analyses have tried to incorporate insights from these methods into existing scholarly conversations, while also confirming how fraught these methodologies can be when used to analyze poems that regularly tie formal experimentation to an explicitly political quest for racial justice.
Another note: while I won’t delve into it in this post, it is important to acknowledge that many of these poets were highly attuned to decontextualized reading practices at work during the period, particularly those that aimed to undermine their quests for justice: government surveillance programs, active FBI counterintelligence operations, and a larger cultural climate fearful of radical thought. In this sense, the use of distanced computational techniques like Pattern and VADER might be seen as a troubling echo of the interpretive practices employed by FBI agents (a group one scholar describes as “ghostreaders”) involved in J. Edgar Hoover’s COINTELPRO, a 1956-1971 FBI program designed, in Hoover’s own words, to “expose, disrupt, misdirect, discredit, or otherwise neutralize the activities of black nationalist, hate-type organizations and groupings, their leadership, spokesmen, membership, and supporters” (3). This program systematically targeted groups and individuals fighting against racial injustice, including operations against targets within the Black Panther Party as well as Martin Luther King, Jr. In these operations the FBI often illegally and intentionally violated the rights of those targeted: in its final report, the Senate Church Committee formed to investigate oversight in government intelligence activities declared, among other things, that “[i]ntelligence agencies have undermined the constitutional rights of citizens.”
In short: my goal here has been to explore the possibility of using these computational tools in a way that pursues questions, problems, and lines of inquiry centered around black thought and experience, including longstanding concerns and topics of interest in BAM scholarship. So if I want to know more about what natural language processing techniques can show us about things BAM scholars have already identified as noteworthy about this poetry—i.e., strategically heightened affects—I also need to take a hard look at what Pattern or VADER do on a line by line basis. This includes questioning those biases and assumptions these programs bring into their evaluations. In this two-part post, I’ll do so by comparing results from Pattern and VADER in the analysis of two poems: first, Troupe’s “Come Sing a Song,” and second, Nikki Giovanni’s “The True Import of Present Dialogue, Black vs. Negro” (from her 1968 Black Feeling, Black Talk).
I have chosen these two poems in particular because each, in one way or another, came to the fore over the course of my exploratory use of computational methods. For example: though both sentiment classifiers consider Troupe’s Embryo Poems as a whole to be somewhere in the middle of my corpus in terms of sentiment, Pattern thinks “Come Sing a Song” has more negative sentiment than any other individual poem. Likewise, VADER considers Giovanni’s “The True Import” to have the most negative sentiment in the corpus, but Pattern disagrees to the point of assigning the poem a positive score.
In short: even when I disagree with the findings of these programs, computational methods have helped guide me, in an exploratory fashion, to poems or groups of poems which I have then re-read, thought through, and analyzed using more conventional literary methods (i.e., historical contextualization, close reading, consideration of relevant scholarship, etc.). What follows is a small window into the first stages this process, showing how my thinking has played out in two examples.
Troupe’s “Come Sing a Song”
Which brings us back to Troupe’s poem. To recap: Pattern considers “Come Sing a Song” to be the most negative poem in my corpus. I do not. In the opening paragraphs of this post, I discussed how I as a human reader thought through the positive, celebratory affective dimensions of this poem, looking to historical context, BAM scholarship, and so on. So how, exactly, does Pattern go about evaluating “Come Sing a Song” for its sentiment?
Because it is a lexicon-based classifier, Pattern’s sentiment analysis of a poem boils down to checking each word in a given snippet of text (in this case a line) against a dictionary of words it already knows to be “positive” or “negative” (rated on a scale of 1.0 to -1.0). Pattern’s dictionary of words with sentiment scores draws from another lexical database called WordNet, and is available to peruse in its entirety on GitHub. Roughly speaking, after scoring each line based on the values of words as found in its dictionary, Pattern weighs the polarity scores of each line to produce the score for the entire poem. (Side note: TextBlob’s PatternAnalyzer might be modifying how this “weighting” process unfolds, but not from what I can gather looking at their GitHub page and their sentiment lexicon—they write in their documentation that PatternAnalyzer is a “[s]entiment analyzer that uses the same implementation as the pattern library.”) Either way, this final score represents the sentiment value of the poem. In the case of “Come Sing a Song,” this is -0.156.
So where does Pattern’s dictionary of positive and negative adjectives come from? As they explain on their website, Pattern’s classifier learned which adjectives were positive or negative based on the kinds of adjectives that appeared in positive and negative product reviews. This training process represents a relatively standard workflow in machine learning: in broad strokes, a corpus of text is marked up by hand (in this case as either positive or negative); a program then “trains” or “learns” to identify positive or negative text by seeing lots of examples of each and generalizing rules that will help it to make accurate predictions in the future; the classifier is then tested or validated by being asked to evaluate the sentiment of texts the creators already know to be positive or negative (also usually marked up by teams of humans). After much tweaking, once the predictive results are acceptably accurate, creators often make their classifiers available for use online, where people, like me, use it for all kinds of things outside the scope of the product’s original intentions.
Now, you may be wondering, why on Earth would someone use a classifier that was trained on product reviews (rather than on poetic corpora) to evaluate something as rhetorically complex as poetry? For a number of mostly pragmatic reasons: perhaps most significantly, creating one of these sentiment lexicons from scratch is an extremely time-consuming process. It takes entire teams of people and lots of resources. My goal in this project is not to develop a sentiment classifier that works on experimental poetry in English. Rather, it’s to see what existing classifiers can show us about a specific corpus of poetry—not just Pattern and VADER, though those are the only ones I will discuss today.
So, we know now that Pattern’s sentiment analysis features were not designed to evaluate the sentiment in Troupe’s “Come Sing a Song.” And, having used them to evaluate this poem, the results seem to confirm this. Consider the lines of the poem I’ve discussed above and their corresponding sentiment scores in Pattern (rounded to the third decimal point):
1. Come sing a song, Black Man, … -0.167
9. sing jazz, rock, or, R & B, … 0.000
11. sing a “bad” freedom song … -0.700
Pattern assigns the six-word snippet of line 9 a score of 0.000 because none of these words appear in its sentiment lexicon. It assigns line 11 a score of -0.700 because, of the three definitions of “bad” Pattern knows, each sense of the adjective has a score of roughly -0.7 (with some variation in the averaging due to the “confidence” of the score—accounting, I’m guessing, for variation in the original human markup).
Pattern’s evaluation of line 1 as -0.167 presents a more troubling problem. The only adjective in this snippet that appears in Pattern’s dictionary is the word “black.” Looking at the code, Pattern knows three meanings to this adjective:
1. “of or belonging to a racial group having dark skin especially of sub-Saharan African origin”, polarity = 0.0
2. “extremely dark”, polarity = -0.4
3. “being of the achromatic color of maximum darkness,” polarity = -0.1
Two of these meanings have a negative polarity (sentiment score) associated with them. As far as I can tell, because Pattern has no idea which sense of the word is being used here, it averages the polarity scores of the three senses to assign a sentiment to the line: -0.167.
This means that, yes, whenever Pattern sees the word “black” in my corpus, it assigns the word a negative sentiment value. This is bad. Even acknowledging that a given tool can’t do all things in all contexts, this is bad. Technology has a long history of reinforcing racism and racist power structures in America; if Pattern’s out-of-the-box sentiment analysis capabilities read the word “black” as expressing negative sentiment—even if the one “sense” of the word referring to race in its dictionary is neutral—that is a huge problem. Moreover, in a project examining poetry from the Black Arts Movement, this particular look “under the hood” renders Pattern’s findings not only extremely troubling, but practically incoherent. If Pattern assigns a sentiment score of -0.167 to the line “I am a black woman” from Mari Evans’s poem of the same name—which it does—it’s hard to see the tool as anything but disturbingly biased in terms of race and sentiment.
What’s more, this problem only became visible to me because I stumbled across it, stopping to look more closely at what felt like weird results. Nothing I could find in Pattern’s (or TextBlob’s) documentation explained how these word-by-word judgment calls would be made—i.e., that it would basically average the scores of different senses of a word in evaluating its sentiment. The discovery came from experimentation on a word- or sentence- level scale—a scale that is often beneath the scope of larger computational projects—as well as careful digging through documentation dispersed over multiple webpages, published articles, and commented code. This isn’t any particular fault of Pattern’s, but rather indicative of the way that even accessible products designed to have “a focus on ease-of-use” have elements that can feel blackboxed—that the details are in there somewhere, even if implicitly in the inner workings of the code itself, but can be hard to find. But because I’m working at the scale that I am and have purposefully spent time in the technical weeds, this particular bias was clear as day.
VADER’s Conflicting Feelings on “Come Sing a Song”
Fortunately, Pattern is not the only sentiment classifier available for projects like mine. VADER (short for “Valence Aware Dictionary for sEntiment Reasoning”), is described by its creators as a “parsimonious rule-based model for sentiment analysis of social media text.” Like Pattern, VADER uses a sentiment lexicon (or dictionary). Unlike Pattern, VADER has been trained specifically with an eye for the “sentiment-oriented language of social media text, which is often expressed using emoticons, slang, or abbreviated text such as acronyms and initialisms” (9). Moreover, VADER was designed to incorporate context for these words: “grammatical and syntactical conventions that humans use when expressing or emphasizing sentiment intensity” (1)—recognizing, in other words, the difference between “good” and “very good,” or “it was great” and “it was great!!!!”. VADER’s sentiment lexicon is available here, which includes the final weighted score of each item along with the original scoring from ten Amazon Mechanical Turk raters (on a scale of -4 to 4) that went into each evaluation.
With regards to the sentiment in “Come Sing a Song,” VADER seems closer to the mark. The three lines it sees as having negative sentiment make more sense to me. Each includes either the phrase “Blind Joe Death” or “prison chain gang,” both of which feel more endowed with negative feelings and associations (“death,” “prison,” and “blind” all have negative scores in VADER’s lexicon). What’s most interesting, however, is VADER’s valuation of the final line: “sing a ‘bad’ freedom song,” which it scores as slightly positive.
Pattern considers this final line to be the most negative in the poem, as “bad” is the only word in the line that Pattern has in its dictionary. While VADER also has “bad” scored as negative (-2.5, with the max being -4), VADER has “freedom” score as positive (3.2, with the max being 4). In other words, with regards to intensity prior to grammatical context, VADER weighs “freedom” as being more positive than “bad” is negative.
On its own, this math isn’t that interesting. What’s interesting is that, in having to weigh these values against one another in this final line, we see VADER’s classifier struggling with the layered meanings of Troupe’s words. The classifier has conflicting feelings. A human reader, of course, would also be doing this, and with much, much more nuance. Just looking at the original punctuation, we can see that Troupe has marked off the word “bad” by putting it in quotation marks—a sign that something special might be going on with this word and how it is being used. My mind goes first to “bad” as it appears in the title of Sonia Sanchez’s 1970 collection We a BaddDDD People, which I discussed in my previous post. This “bad” doesn’t read so much as negative as it does “dangerously good,” to quote from William J. Maxwell’s work on African American literature and the FBI (289). In this sense, singing a “‘bad’ freedom song” feels again like an invitation to celebrate—in this case, the “dangerously good” work of black individuals in the struggle for freedom in America, past and present.
But VADER isn’t reading for things like this. Even with regards to the quotes surrounding the word “bad,” as I discussed in a previous post, my program removes the punctuation and capital letters in this line for tokenization purposes before it even makes it to VADER. Instead, VADER is just going on the surface meanings—the denotation of these words rather than their potential connotations. But while VADER knows nothing about history, freedom, singing, or that the word “bad” might actually mean “good,” we can see the classifier in its own way trying to sort out the layers of meaning in this line—that whatever is in this snippet of text might be both negative and positive at the same time, or in different ways.
For me, this instance of conflicting feeling represents an excellent jumping off point for the larger questions of how feeling, affect, and sentiment might be operating in a poem or group of poems: the cues and signals that VADER struggles with but human readers can take almost for granted; the biases that classifiers bring to evaluating feelings versus those of a human reader; the way individual words carry affective weight both in and in spite of their context.
But that’s all for now—the second half of this post, which dives into VADER’s reading of Nikki Giovanni’s “The True Import” and offers more general reflections on this research process as a whole, will follow shortly!