Day 17: Word Count

Objective

Your task today is to create a program that counts the total number of words in a given sentence or paragraph. Word counting is one of the simplest and most useful text-processing operations and has applications in everything from basic text analysis to advanced natural language processing (NLP).

For example:

  • Input: "The quick brown fox jumps over the lazy dog"
  • Output: 9 words

Why This Challenge Is Important

This task introduces you to string processing and teaches you how to:

  1. Split and Analyze Text: Break a string into smaller components (words).
  2. Handle Input Variations: Deal with cases like extra spaces or punctuation.
  3. Work with Strings and Loops: Gain confidence manipulating text data, a crucial skill in programming.

Steps to Solve

1. Understand the Problem

  • A “word” is defined as a sequence of characters separated by spaces.
  • You’ll read a sentence or paragraph as input and count the number of words.

2. Basic Plan

  1. Take a string input from the user.
  2. Split the string into words using spaces as separators.
  3. Count the resulting words and output the total.

3. Handling Edge Cases

  • Input with extra spaces (e.g., " Hello world ").
  • Empty input (e.g., "").
  • Input with punctuation (e.g., "Hello, world!").

Code Examples

Python Example

Basic Word Count:

# Get input from the user
text = input("Enter a sentence or paragraph: ")

# Split the text into words
words = text.split()

# Count the number of words
word_count = len(words)

# Output the result
print("Word count:", word_count)

Handling Punctuation and Extra Spaces:

import re

# Get input from the user
text = input("Enter a sentence or paragraph: ")

# Clean the text by removing punctuation and splitting into words
words = re.findall(r'\b\w+\b', text)

# Count the number of words
word_count = len(words)

# Output the result
print("Word count:", word_count)

Java Example

Basic Word Count:

import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        // Get input from the user
        System.out.print("Enter a sentence or paragraph: ");
        String text = scanner.nextLine();

        // Split the text into words
        String[] words = text.trim().split("\\s+");

        // Count the number of words
        int wordCount = words.length;

        // Output the result
        System.out.println("Word count: " + wordCount);
    }
}

Handling Punctuation:

import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        // Get input from the user
        System.out.print("Enter a sentence or paragraph: ");
        String text = scanner.nextLine();

        // Remove punctuation and split into words
        text = text.replaceAll("[^a-zA-Z0-9 ]", "").trim();
        String[] words = text.split("\\s+");

        // Count the number of words
        int wordCount = words.length;

        // Output the result
        System.out.println("Word count: " + wordCount);
    }
}

JavaScript Example

Basic Word Count:

// Get input from the user
let text = prompt("Enter a sentence or paragraph:");

// Split the text into words
let words = text.trim().split(/\s+/);

// Count the number of words
let wordCount = words.length;

console.log("Word count:", wordCount);

Handling Punctuation:

// Get input from the user
let text = prompt("Enter a sentence or paragraph:");

// Remove punctuation and split into words
let words = text.replace(/[^\w\s]/g, "").trim().split(/\s+/);

// Count the number of words
let wordCount = words.length;

console.log("Word count:", wordCount);

Edge Cases to Consider

  1. Empty Input: If the user provides no input, the word count should be 0.
  2. Only Spaces: Inputs like " " should also return 0.
  3. Mixed Whitespace: Inputs with irregular spacing (e.g., "Hello world") should still return the correct count.
  4. Punctuation: Ensure punctuation does not count as a separate word.

Extensions to Explore

  1. Count Each Word: Extend the program to count the frequency of each word in the input.
  2. Character Count: Add functionality to count the total number of characters, excluding spaces.
  3. Longest Word Finder: Modify the program to identify the longest word in the input.

What You’ve Learned

  • How to split and analyze text using string manipulation.
  • Techniques for cleaning and processing user input.
  • Practical applications of working with strings in programming.

Next Steps

In Day 18: Find Unique Words, you’ll expand your string manipulation skills by identifying and counting unique words in a sentence or paragraph. This task builds on today’s work and introduces the concept of data deduplication!