The tutorial for today will explore how to parse a string in
c++. Parsing a string or line of characters is a crucial exercise in
programming, as many programs will need to read lines of code for bites of data,
usually held in character-based strings. Parsing is seen in many different
aspects of programming, from networking programs that need to parse a URL, to a
graphics engine that needs to parse a model file for a 3D shape, to a program
like Microsoft Office that measures then number of words in a document or
selected line of text.
Further, we will be implementing this program with command
line arguments. These are the parameters found in the main() function: argc
and argc[], which respectively stand
for argument count and argument vector. Argc is an integer type that holds the number of arguments being
passed to the program when it runs, and argv[]
is an array of C-style character strings, the size of argc. These strings in argv
can be simple strings, such as in our case; but they can also provide strings
like address paths for files like image files and text files that hold data.
We will be taking normal string arguments for our program,
and parsing them by counting the number of words inside each string by checking
for whitespace i.e. ‘ ‘ in each string
to determine an instance of a word. The strings will be normal, double
quotation strings (“ ”) like in C++.
The first line in the main() function is a call to printf(),
with a string supplied as the first argument and argv[0] as the second. The string
"Program name %s\n" will output “Program name” and the name you give
the program when you compile it: it is the name of that application being ran,
and this is the case for every application that the first argument in argv[] is
the applications name. The %s inside the string is a flag, indicating where
certain text goes in that string. The %s indicates a string variable, with
argv[0] being supplied as the string to copy into the position ‘%s’ is at in
the string.
An important point to make for argv[]: it is always ended
with a ‘NULL’ value. This means that that value will need to be tested for when
parsing our command line arguments, which is what we do in the first ‘for’ loop
in the ‘if’ loop, and again in the second ‘for’ loop below the first. This also
means that a minimum of two arguments are in each instance of argv; the
executable name and a ‘NULL’ value to indicate the end of the argument list.
This is to enable the argument vector to have an end point to test for, to
prevent an infinite loop in the main function.
The second ‘for’ loop will be the loop where we tokenize our string, meaning to break
the string into several words, or in our case to simply count the words inside
a string. Tokenization is crucial for areas of programming that require the
analysis of strings, and in the case of data security, the masking of sensitive
data with reference variables. Our argument is passed to a string variable,
which is then passed to the wordCount
method.
Each character in the string supplied with the call to
wordCounter(string) is analyzed in this method. The if() loop will test each
input character to determine two conditions; A: if that first character is
whitespace, and B: if the character before the first is not whitespace (to
prevent counting multiple spaces as a word). If these two conditions are met,
then the integer wordCount is
incremented. When the line is fully read, the method will return the value of wordCount to the count variable in the
‘if’ loop. After that, the program will output a message with the count variable, displaying the number of
words counted in the string.
Navigate in the command prompt to the location of your
testing area, start notepad and create a .cpp file, and copy and paste the code
below; when finished, save the file and compile it using:
G++ -Wall filename.cpp –o applicationName.exe
And when you run the executable, on the right hand side, add
some strings in double-quotation marks, like so:
applicationName.exe “Hello
There!” “:D How are you?” “Good, thank you very much”
Your program will output the application name, followed by
the strings you’ve input. Then the strings will be analyzed by the method wordCount, which will output both the
string, and the message with the word count.
New Terms:
Tokenization: The
programming act of separating the words in a string, either counting them or
moving them into separate data structures altogether.
argc: The main()
parameter that holds the number of arguments passed into the application. Use
this to keep count of the arguments: remember that there will always be at
least two elements, so test for a number >= 2 to test for any arguments
passed into the application.
argv[]: The
second main() parameter; this one holds the actual string arguments that the
application will be working with. Remember that the first value (argv[0]) will
always be the application name, and that the last element (argv[last]) will
always be a NULL value. Remember to test for the NULL value, and never pass it
to any methods.
Code
#include <stdio.h>
#include <string>
#include <iostream>
int wordCounter(std::string);
int main( int argc, char *argv[] )
{
printf("Program name %s\n", argv[0]);
if( argc >= 2 )
{
for(int i = 0; i
<= argc; i++)
{
if(argv[i] != NULL) printf("%s
\n", argv[i]);
else continue;
}
for(int i = 1; i <= argc; i++)
{
if(argv[i] != NULL) {
std::string
argument = argv[i];
printf("%s \n", argv[i]); //Print the argument
int count = wordCounter(argument);
printf("The number of words in this line is %i \n",
count);
}
else
continue;
}
}
else
{
printf("No arguments were supplied.\n");
}
return 0;
}
int wordCounter(std::string line)
{
int wordCount = 1; //initialize
to one word counted
for(unsigned int i = 0; i < line.length(); i++)
{
if(line[i] == ' '
&& line[i - 1] != ' ')
wordCount++;
}
return wordCount;
}
No comments:
Post a Comment