Note that this site is in currently in version 1.0.0-alpha.   Some functionality may be limited.

4. Searching for Words

Let’s start by analyzing Moby Dick, which is text1 for NLTK. The first function we will look at is concordance. “Concordance” in this context means the characters on either side of the word. Our text is behaving like one giant string, so concordance will just count the number of characters on either side. By default, this is 25 characters on either side of our target word (including spaces), but you can change that if you want. In the Jupyter Notebook, type:

text1.concordance("whale")

The output shows us the 25 characters on either side of the word “whale” in Moby Dick. Let’s try this with another word, “love.” Just replace the word “whale” with “love,” and we get the contexts in which Melville uses “love” in Moby Dick. concordance is used (behind the scenes) for several other functions, including similar and common_contexts. Let’s now see which words appear in similar contexts as the word “love.” NLTK has a built-in function for this as well: similar.

text1.similar("love")

Behind the scenes, Python found all the contexts where the word “love” appears. It also finds similar environments, and then what words were common among the similar contexts. This gives a sense of what other words appear in similar contexts. This is somewhat interesting in itself, but more interesting if we compare it to something else. Let’s take a look at another text. What about Sense and Sensibility (text2)? Let’s see what words are similar to “love” in Jane Austen’s writing. In the next cell, type:

text2.similar("love")

We can compare the two and see immediately that Melville and Austen use the word “love” differently.

Investigating “lol”

Let’s expand from novels for a minute and take a look at the NLTK Chat Corpus. In chats, text messages, and other digital communication platforms, “lol” is exceedingly common. We know it doesn’t simply mean “laughing out loud”—maybe the similar function can provide some insight into what it does mean.

text5.similar("lol")

The resulting list is a lot of greetings, indicating that “lol” probably has more of a phatic function. Phatic language is language primarily for communicating social closeness. Phatic words stand in contrast to semantic words, which contribute meaning to the utterance.

If you are interested in this type of analysis, take a look at the common_contexts function in the NLTK book or in the NLTK docs.

Challenges for lesson 4

Questions

Try again!

Which one of the following sentences is correct:

(Select one of the following)

Terms Used in Lesson

Can you define the terms below? Hover over each of them to read a preview of the definitions.

Concordance

Nltk function that allows to see the characters on both sides of the word; an easy way to investigate the context of a certain word across a corpus.

See term page

Phatic Language

Phatic language is language primarily for communicating social closeness. Phatic words stand in contrast to semantic words, which contribute meaning to the utterance.

See term page

Workshop overall progress