07: Strings

A lot of students struggle with the String type. Your understanding of this type is extremely important, and failure to understand will seriously impact your performance in labs. Therefore, I’ve decided to dedicate a post entirely to the use and manipulation of Strings.

What is a String?

In programming, a letter, symbol, space, new line, or tab is usually referenced by a numerical value (encoded by ASCII or Unicode). This is called a character, or char. A String is what we use to place several of these characters in sequence to form a word or other expression (ex. “dog”). It is literally a “string” of characters!

In Python, we will use the String type for any combination of letters, even if it is only one character. Strings are identified by placing the value between quotes:

myString = "Hello World."

Strings are usually used to store words that relay information, but they can also be used to store numbers. Note that if a number is defined as a string, you can not perform arithmetic operations properly on it!

Try it yourself! How does Python handle these commands?

"2" * "2"   # ?  
"5" * 100   # ? 
"5" + "9.0" # ? 
"5"**2      # ?

Manipulating Strings

Strings are versatile, and they can store many types of information. From single letters, to entire documents of data. Strings are also fairly easy in Python, but they are not mutable. This means you can do this:

myString = "I am a dog" letter_i = myString[0]

But not this:

myString = "1234567890" myString[0] = 0

Once a String is made, you can not directly change it. To make changes to a string, you must perform the manipulation and save over the value. For example, to add some text to a string:

myString = "This is some text" 
myString = myString + " and this is some more text" 
print myString

Notice how we referenced the value of myString, but did not attempt to change it directly. Instead, we made a new string (line 2), and saved over the old value inside myString. If this is still unclear, please refer to your textbook.

There are a few other ways to manipulate Strings as well. Your textbook outlines these well, and I will touch on them here. I recommend you try these in the Shell to see the results.:

# Add two strings 
"I am a string" + "I am also a String" 

# Multiply strings 
"ha" * 100 

# Split a string on whitespace 
# Breaks the string into a list of smaller strings 
"this is a string".split() 

# Split a string on something else 
# In this case, split on the period 
# Don't forget quotes! 
"These are two sentences. We must separate them.".split(".") 

# Strip a string ONLY removes the target from the ends of the string 
# of the specified character from the left and right. 
"_hello world_".strip("_") 

# Notice the difference here: 

# Take a slice of a string. 
# Remember this notation is like the range function. 
# It includes the lower limit(0), but not the upper limit (4) 
myString = "This is a string, but I only want the first word." 
#Includes 0, 1, 2, 3. 

# Number of characters in a string (look familiar?) 
len("There are 39 characters in this string.")

Remember when a String is in a list, you must access the String as an element in the list before you can perform these operations on it. For example, I have the list of strings:

myStrings = ["a", "b", "c", "d"]

To access these strings, I have to reference them inside the list. If I wanted to add them all together:

myStrings = ["a", "b", "c", "d"] 

for i in range(len(myStrings)):
    newString = newString + myStrings[i]

Try this and see what your result is. This method can be very helpful for various problems, including your recent minor lab.

Minor Lab: Word Count

A lot of students struggled with this lab. Let’s see how we can take a document, and count the lines, words, and characters inside. You will need to know this stuff for your major lab.

Here’s my .txt file:

Here is my document. 
I have a few lines, and some words. 
I've added some numbers at the end after the blank line. 


Now let’s open the file, and calculate three things:

  1. The number of lines in the file
  2. The number of words in the file
  3. The number of characters in the file (including spaces, but excluding new line characters)

Then we’re going to create a new file, output.txt, and print our results to that file.

Remember when I hit enter in Notepad (TextEdit for MacOSX), I created a new line character, which shows up as "\n" in python.

Note: This example requires that we copy a list. You should know from class that there are special methods to copy list A to list B so that changes to B do not affect A. There is a well-written example of this here.

# File: minorLabCounter.py 
# Author: Jamie Counsell 

# This file computes the number of lines, 
# words, and characters in a given file. 
# Note: there are many ways to accomplish this task. 
# The one I have chosen is easiest to explain. 
# Open and read the file. ONLY DO THIS ONCE. 
# Save the data to a variable that you do not change, 
# So you always have the data when it's needed 
fileName = raw_input("Enter the file name:") #Always returns String 
f = open(fileName, 'r') 

# Break the file into a list of lines, where each line ends 
# with the new line character, except for the last line. 
fileContents = f.readlines() 

# Here's a visual example of why I only need 
# to open the file once. 

lineCountData = list(fileContents) 

# Use the list constructor to create 
wordCountData = list(fileContents) 
# a new instance of the list every time.
charCountData = list(fileContents) 

# --------------- LINE COUNT -------------------- 
# We know we just seperated the data into a list 
# of lines, where each line is a single element. 
# Therefore, the length of that list is the same as 
# the number of lines 
lineCount = len(lineCountData) 

# --------------- WORD COUNT -------------------- 
# We already have the lines seperated, but need to futher 
# Seperate the words from each other. To do this, we can use the 
# split() command. Remember we must split up the words in each 
# line inside the list. 
# Initialize our sum 
wordCount = 0 

# Iterate through our entire list (each line) 
for i in range(len(wordCountData)): 
    # Split the line into a list of words. 
    wordCountData[i] = wordCountData[i].split() 
    # Update our sum with the number of words in each line 
    wordCount = wordCount + len(wordCountData[i]) 

# ------------- CHARACTER COUNT ----------------- 
# Initizlize our sum 
charCount = 0 

for i in range(len(charCountData)): 
    # Strip new line characters from the ends of the line 
    charCountData[i] = charCountData[i].strip("n") 
    # Update our sum with the character count of each line 
    charCount = charCount + len(charCountData[i]) 

# ----------------------------------------------- 
# Now we want to construct the output file. 
# The best way to do this is to create a string, and then 
# simply write that string to a new file. 
outString = "Line Count = " + str(lineCount) 
outString = outString + "nWord Count = " + str(wordCount) 
outString = outString + "nChar Count = " + str(charCount) 

# Open output file or create if it doesn't exist. 
outFile = open("output.txt", 'w') 
# Write 
# Close and save changes 

Or a much simpler version of the solution:

data = open(raw_input("Enter the file name:"), 'r').read()     # Read the file to a string 
lines = "Lines: " + str(len(data.split("n")))                  # lines: len of split on newline 
words = "Words: " + str(len(data.split()))                     # words: sum of len of lines split on space 
chars = "Chars: " + str(sum(len(x) for x in data.split("n")))  # chars: sum of chars in each line 
open("output.txt", 'w').write("n".join([lines, words, chars])) # Write data to file

These more complex list comprehensions (as seen in the character count) are an interesting tool in Python. If you’d like to know more about how these work, try the python documents here, or ask me if you see me around!