C strings and C++ strings
C strings (a.k.a. null-terminated strings)
Declaration
A C string is usually declared as an array of char
. However, an array of char
is NOT by itself a C string. A valid C string requires the presence of a terminating "null character" (a character with ASCII value 0, usually represented by the character literal '\0'
).
Since char
is a built-in data type, no header file is required to create a C string. The C library header file <cstring>
contains a number of utility functions that operate on C strings.
Here are some examples of declaring C strings as arrays of char
:
char s1[20]; // Character array - can hold a C string, but is not yet a valid C string char s2[20] = { 'h', 'e', 'l', 'l', 'o', '\0' }; // Array initialization char s3[20] = "hello"; // Shortcut array initialization char s4[20] = ""; // Empty or null C string of length 0, equal to ""
It is also possible to declare a C string as a pointer to a char
:
char* s3 = "hello";
This creates an unnamed character array just large enough to hold the string (including the null character) and places the address of the first element of the array in the char
pointer s3
. This is a somewhat advanced method of manipulating C strings that should probably be avoided by inexperienced programmers. If used improperly, it can easily result in corrupted program memory or runtime errors.
Representation in Memory
Here is another example of declaring a C string:
char name[10] = "Karen";
The following diagram shows how the string name
is represented in memory:
The individual characters that make up the string are stored in the elements of the array. The string is terminated by a null character. Array elements after the null character are not part of the string, and their contents are irrelevant.
A "null string" is a string with a null character as its first character:
The length of a null string is 0.
What about a C string declared as a char
pointer?
char* name = "Karen";
This declaration creates an unnamed character array just large enough to hold the string "Karen"
(including room for the null character) and places the address of the first element of the array in the char
pointer name
:
Subscripting
The subscript operator may be used to access the individual characters of a C++ string:
cout << s3[1] << endl; // Prints the character 'e', the second character in the string "Hello"
Since the name of a C string is converted to a pointer to a char
when used in a value context, you can also use pointer notation to access the characters of the string:
cout << *s3 << endl; // Prints the character 'h', the character pointed to by s3 cout << *(s3 + 4) << endl; // Prints the character 'o', the fifth character in the string "Hello"
String Length
You can obtain the length of a C string using the C library function strlen()
. This function takes a character pointer that points to a C string as an argument. It returns an unsigned int
, the number of valid characters in the string (not including the null character).
Examples
char s[20] = "Some text"; cout << "String length is " << strlen(s) << endl; // Length is 9 // Loop through characters of string for (int i = 0; i < (int) strlen(s); i++) cout << s[i]; cout << endl;
String Comparison
Comparing C strings using the relational operators ==
, !=
, >
, <
, >=
, and <=
does not work correctly, since the array names will be converted to pointers. For example, the expression
if (s1 == s2) { ... }
actually compares the addresses of the first elements of the arrays s1
and s2
, not their contents. Since those addresses are different, the relational expression is always false.
To compare the contents of two C strings, you should use the C library function strcmp()
. This function takes two pointers to C strings as arguments, either or both of which can be string literals. It returns an integer less than, equal to, or greater than zero if the first argument is found, respectively, to be less than, to match, or be greater than the second argument.
The strcmp()
function can be used to implement various relational expressions:
if (strcmp(s1, s2) < 0) // If the C string s1 is less than the C string s2 { ... } if (strcmp(s1, s2) == 0) // If the C string s1 is equal to the C string s2 { ... } if (strcmp(s1, s2) > 0) // If the C string s1 is greater than the C string s2 { ... } if (strcmp(s1, s2) <= 0) // If the C string s1 is less than or equal to the C string s2 { ... } if (strcmp(s1, s2) != 0) // If the C string s1 is not equal to the C string s2 { ... } if (strcmp(s1, s2) >= 0) // If the C string s1 is greater than or equal to the C string s2 { ... }
Assignment
A character array (including a C string) can not have a new value assigned to it after it is declared.
char s1[20] = "This is a string"; char s2[20]; s1 = "Another string"; // error: invalid array assignment s2 = s1; // error: invalid array assignment
The C++ compiler interprets these assignment statements as attempts to change the address stored in the array name, not as attempts to change the contents of the array. The address stored in an array's name may not be changed, since this could result in loss of access to the array storage.
To change the contents of a character array, use the C library function strcpy()
. This function takes two arguments: 1) a pointer to a destination array of characters that is large enough to hold the entire copied string (including the null character), and 2) a pointer to a valid C string or a string literal. The function returns a pointer to the destination array, although this return value is frequently ignored.
Examples
char s1[20]; char s2[20] = "Another new string"; strcpy(s1, ""); // Contents of s1 changed to null string strcpy(s1, "new string"); // Contents of s1 changed to "new string" strcpy(s1, s2); // Contents of s1 changed to "Another new string"
If the string specified by the second argument is larger than the character array specified by the first argument, the string will overflow the array, corrupting memory or causing a runtime error.
Input and Output
The stream extraction operator >>
may be used to read data into a character array as a C string. If the data read contains more characters than the array can hold, the string will overflow the array.
The stream insertion operator <<
may be used to print a C string or string literal.
Concatenation
The C library function strcat()
can be used to concatenate C strings. This function takes two arguments: 1) a pointer to a destination character array that contains a valid C string, and 2) a pointer to a valid C string or string literal. The function returns a pointer to the destination array, although this return value is frequently ignored.
char s1[20] = "Hello"; char s2[20] = "friend"; strcat(s1, ", my "); // s1 now contains "Hello, my " strcat(s1, s2); // s1 now contains "Hello, my friend"
The destination array must be large enough to hold the combined strings (including the null character). If it is not, the array will overflow.
Passing and returning
Regardless of how a C string is declared, when you pass the string to a function or return it from a function, the data type of the string can be specified as either char[]
(array of char
) or char*
(pointer to char
). In both cases, the string is passed or returned by address.
A string literal like "hello"
is considered a constant C string, and typically has its data type specified as const char*
(pointer to a char
constant).
C++ string objects
Declaration
A C++ string is an object of the class string
, which is defined in the header file <string>
and which is in the standard namespace. The string
class has several constructors that may be called (explicitly or implicitly) to create a string object.
Examples
string s1; // Default constructor - creates an empty or null C++ string of length 0, equal to "" string s2("hello"); // Explicit constructor call to initialize new object with C string string s3 = "hello"; // Implicit constructor call to initialize new object with C string string s4(s2); // Explicit constructor call to initialize new object with C++ string string s5 = s2; // Implicit constructor call to initialize new object with C++ string
Representation in Memory
Here is another example of declaring a C++ string:
string name = "Karen";
name
is a string
object with several data members. The data member p
is a pointer to (contains the address of) the first character in a dynamically-allocated array of characters. The data member length
contains the length of the string. The data member capacity
contains the number of valid characters that may currently be stored in the array (not including a null character). A "null string" is a string with a null character as its first character:
The length of a null string is 0.
Subscripting
The subscript operator may be used to access the individual characters of a C++ string:
cout << s3[1] << endl; // Prints the character 'e', the second character in the string "Hello"
The reason this works is a C++ feature called operator overloading. Using the subscript operator with a C++ string
object actually calls a special member function named operator[]
that has been defined as part of the string
class. The subscript specified inside the brackets is passed as an argument to the member function, which then returns the character at that position in the string.
The name of a C++ string
object is not a pointer and you can not use pointer notation with it or perform pointer arithmetic on it.
String Length
You can obtain the length of a C++ string using the string
class methods length()
or size()
. Both of methods return an unsigned int
, the number of valid characters in the string (not including the null character).
Examples
string s = "Some text"; cout << "String length is " << s.length() << endl; // Length is 9 // Loop through characters of string for (int i = 0; i < (int) s.size(); i++) cout << s[i]; cout << endl;
String Comparison
C++ strings may be compared using the relational operators ==
, !=
, >
, <
, >=
, and <=
. A C++ string may be compared to either another C++ string or a valid C string, including a string literal. All such relational expressions resolve to the Boolean values true
or false
.
Examples
if (s1 > s2) // Compare two C++ strings { ... } if ("cat" == s2) // Compare C string literal and C++ string { ... } if (s3 != cstr) // Compare C++ string and array containing C string { ... }
Like subscripting, this works because of operator overloading.
Assignment
You can assign a C++ string, a C string, or a C string literal to a C++ string.
Examples
string s1 = "original string"; string s2 = "new string"; char s3[20] = "another string"; s1 = s2; // s1 changed to "new string" s1 = s3; // s1 changed to "another string" s1 = "yet another string"; // s1 changed to "yet another string"
Once again, this works because of operator overloading.
Input and Output
The stream extraction operator >>
may be used to read data into a C++ string object.
The stream insertion operator <<
may be used to print a C++ string object.
Concatenation
The operator +
may be used to concatenate C++ strings. C++ strings, C strings, and string literals may all be concatenated together in any order. The result is a C++ string object that may be assigned to another C++ string object, passed to a function that takes a C++ string object as an argument, printed, etc.
string s1 = "Hello"; string s2 = " good "; char s3[10] = "friend"; s1 = s1 + ", my " + s2 + s3; // s1 now contains "Hello, my good friend"
Passing and returning
C++ string objects are passed and returned by value by default. This results in a copy of the string object being created.
To save memory (and a possible call to the copy constructor), a string object is frequently passed by reference instead.
Converting one string type to the other
Sometimes you have one type of string, but you want to use a function or method that requires the other type. In that case, it's useful to be able to convert one string type to the other.
You can easily create a C++ string object from a C string or string literal. Declare the string object and pass the C string or string literal as a constructor argument.
What if you have a C++ string object and need to convert it to a C string? The string
class provides a method called c_str()
that returns a pointer to the underlying array of characters that holds the contents of the string. If the array does not already contain a null character (it usually does), one is appended. The C string returned by this method can not be modified, but it can be used, printed, copied, etc.
char s1[20]; string s2 = "My C++ string"; strcpy(s1, s2.c_str()); // Copies the C string "My C++ string" into the array s1
So which of these string types should I use?
Use C++ strings whenever possible, since . Unfortunately, it's not always possible to avoid using C strings.
- Command line arguments are passed into
main()
as C strings - File names have to be specified as C strings when opening a file
- There are a number of useful C string library functions that have no equivalent in the C++
string
class - C++ strings can't be serialized in binary format without writing a bunch of extra code
- etc.
In short, a good C++ programmer needs to understand and be able to manipulate both types of strings.
==============================================================================================================================================================
C++ Reference Material
Strings in C and C++This page summarizes many of the things you may find it useful to know when working with either C-strings or objects of the C++ string class.
The term string generally means an ordered sequence of characters, with a first character, a second character, and so on, and in most programming languages such strings are enclosed in either single or double quotes. In C++ the enclosing delimiters are double quotes. In this form the string is referred to as a string literal and we often use such string literals in output statements when we wish to display text on the screen for the benefit of our users. For example, the usual first C++ program displays the string literal "Hello, world!" on the screen with the following output statement:
cout << "Hello, world!" << endl;
However, without string variables about all we can do with strings is output string literals to the screen, so we need to expand our ability to handle string data. When we talk about strings in C++, we must be careful because the C language, with which C++ is meant to be backward compatible, had one way of dealing with strings, while C++ has another, and to further complicate matters there are many non-standard implementations of C++ strings. These should gradually disappear as compiler vendors update their products to implement the string component of the C++ Standard Library.
As a programmer, then, you must distinguish between the following three things:
- An "ordinary" array of characters, which is just like any other array and has no special properties that other arrays do not have.
- A C-string, which consists of an array of characters terminated by the null character '\0', and which therefore is different from an ordinary array of characters. There is a whole library of functions for dealing with strings represented in this form. Its header file is <cstring>. In some implementations this library may be automatically included when you include other libraries such as the <iostream> library. Note that the null character may very well not be the very last character in the C-string array, but it will be the first character beyond the last character of the actual string data in in that array. For example if you have a C-string storing "Hello" in a character array of size 10, then the letters of the word "Hello" will be in positions with indices 0 to 4, there will be a null character at index 5, and the locations with indices 6 to 9 will contain who-knows-what. In any case, it's the null character at index 5 that makes this otherwise ordinary character array a C-string.
- A C++ string object, which is an instance of a "class" data type whose actual internal representation you need not know or care about, as long as you know what you can and can't do with variables (and constants) having this data type. There is a library of C++ string functions as well, available by including the <string> header file.
Both the C-string library functions and the C++ string library functions are available to C++ programs. But, don't forget that these are two *different* function libraries, and the functions of the first library have a different notion of what a string is from the corresponding notion held by the functions of the second library. There are two further complicating aspects to this situation: first, though a function from one of the libraries may have a counterpart in the other library (i.e., a function in the other library designed to perform the same operation), the functions may not be used in the same way, and may not even have the same name; second, because of backward compatibility many functions from the C++ string library can be expected to work fine and do the expected thing with C-style strings, but not the other way around.
The last statement above might seem to suggest we should use C++ strings and forget about C-strings altogether, and it is certainly true that there is a wider variety of more intuitive operations available for C++ strings. However, C-strings are more primitive, you may therefore find them simpler to deal with (provided you remember a few simple rules, such as the fact that the null character must always terminate such strings), and certainly if you read other, older programs you will see lots of C-strings. So, use whichever you find more convenient, but if you choose C++ strings and occasionally need to mix the two for some reason, be extra careful. Finally, there are certain situations in which C-strings must be used.
To understand strings, you will have to spend some time studying sample programs. This study must include the usual prediction of how you expect a program to behave for given input, followed by a compile, link and run to test your prediction, as well as subsequent modification and testing to investigate questions that will arise along the way. In addition to experimenting with any supplied sample programs, you should be prepared to make up your own.
In the following examples we attempt to draw the distinction between the two string representations and their associated operations. The list is not complete, but we do indicate how to perform many of the more useful kinds of tasks with each kind of string. The left-hand column contains examples relevant to C-strings and the right-hand column shows analogous examples in the context of C++ strings.
C-strings (#include) C++ strings (#include )=============================== ================================!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Declaring a C-string variable Declaring a C++ string object----------------------------- -----------------------------char str[10]; string str;Initializing a C-string variable Initializing a C++ string object-------------------------------- --------------------------------char str1[11] = "Call home!"; string str1("Call home!");char str2[] = "Send money!"; string str2 = "Send money!";char str3[] = {'O', 'K', '\0'}; string str3("OK");Last line above has same effect as:char str3[] = "OK"; string str4(10, 'x'); Assigning to a C-string variable Assigning to a C++ string object-------------------------------- --------------------------------Can't do it, i.e., can't do this: string str;char str[10]; str = "Hello";str = "Hello!"; str = otherString;Concatenating two C-strings Concatenating two C++ string objects--------------------------- ------------------------------------strcat(str1, str2); str1 += str2;strcpy(str, strcat(str1, str2)); str = str1 + str2;Copying a C-string variable Copying a C++ string object--------------------------- ---------------------------char str[20]; string str;strcpy(str, "Hello!"); str = "Hello";strcpy(str, otherString); str = otherString;Accessing a single character Accessing a single character---------------------------- ----------------------------str[index] str[index] str.at(index) str(index, count) Comparing two C-strings Comparing two C++ string objects----------------------- --------------------------------if (strcmp(str1, str2) < 0) if (str1 < str2) cout << "str1 comes 1st."; cout << "str1 comes 1st.";if (strcmp(str1, str2) == 0) if (str1 == str2) cout << "Equal strings."; cout << "Equal strings.";if (strcmp(str1, str2) > 0) if (str1 > str2) cout << "str2 comes 1st."; cout << "str2 comes 1st."; Finding the length of a C-string Finding the length of a C++ string object-------------------------------- -----------------------------------------strlen(str) str.length()Output of a C-string variable Output of a C++ string object----------------------------- -----------------------------cout << str; cout << str;cout << setw(width) << str; cout << setw(width) << str;
In what follows, keep in mind that cin ignores white space when reading a string, while cin.get(), cin.getline() and getline() do not. Remember too that cin.getline() and getline() consume the delimiter while cin.get() does not. Finally, cin can be replaced with any open input stream, since file input with inFile, say, behaves in a manner completely analogous to the corresponding behavior of cin. Analogously, in the output examples given immediately above, cout could be replaced with any text output stream variable, say outFile. In all cases, numCh is the maximum number of characters that will be read.
Input of a C-style string variable Input of a C++ string object---------------------------------- ----------------------------cin >> s; cin >> s;cin.get(s, numCh+1);cin.get(s, numCh+1,'\n');cin.get(s, numCh+1,'x');cin.getline(s, numCh+1); getline(cin, s);cin.getline(s, numCh+1, '\n');cin.getline(s, numCh+1, 'x'); getline(cin, s, 'x');
A useful naming convention for C-strings is illustrated by examples like
typedef char String80[81];typedef char String20[21];
in which the two numbers in each definition differ by 1 to allow for the null character '\0' to be stored in the array of characters, but to *not* be considered as part of the string stored there. No analog to this naming convention is necessary for C++ strings, since for all practical purposes, each C++ string variable may contain a string value of virtually unlimited length.
=============================================================================================================================
字符串头文件
C++标准库很大。非常大。难以置信的大。怎么个大法?这么说吧:在C++标准中,关于标准库的规格说明占了密密麻麻300 多页,这还不包括标准C 库,后者只是"作为参考"(老实说,原文就是用的这个词)包含在C++库中。当然,并非总是越大越好,但在现在的情况下,确实越大越好,因为大的库会包含 大量的功能。标准库中的功能越多,开发自己的应用程序时能借助的功能就越多。C++库并非提供了一切(很明显的是,没有提供并发和图形用户接口的支持), 但确实提供了很多。几乎任何事你都可以求助于它。在归纳标准库中有些什么之前,需要介绍一下它是如何组织的。因为标准库中东西如此之多,你(或象你一样的 其他什么人)所选择的类名或函数名就很有可能和标准库中的某个名字相同。为了避免这种情况所造成的名字冲突,实际上标准库中的一切都被放在名字空间std 中(参见条款28)。但这带来了一个新问题。无数现有的C++代码都依赖于使用了多年的伪标准库中的功能,例如,声明 在<iostream.h>,<complex.h>,<limits.h>等头文件中的功能。现有软件没有针对使 用名字空间而进行设计,如果用std 来包装标准库导致现有代码不能用,将是一种可耻行为。(这种釜底抽薪的做法会让现有代码的程序员说出比"可耻" 更难听的话)慑于被激怒的程序员会产生的破坏力,标准委员会决定为包装了std 的那部分标准库构件创建新的头文件名。生成新头文件的方法仅仅是将现有C++头文件名中的。h 去掉,方法本身不重要,正如最后产生的结果不一致也并不重要一样。所以<iostream.h>变成 了<iostream>,<complex.h>变成了<complex>,等等。对于C 头文件,采用同样的方法,但在每个名字前还要添加一个c.所以C 的<string.h>变成了<cstring>,<stdio.h>变成了<cstdio>,等等。 最后一点是,旧的C++头文件是官方所反对使用的(即,明确列出不再支持),但旧的C 头文件则没有(以保持对C 的兼容性)。实际上,编译器制造商不会停止对客户现有软件提供支持,所以可以预计,旧的C++头文件在未来几年内还是会被支持。
所以,实际来说,下面是C++头文件的现状:
旧的C++头文件名如<iostream.h>将会继续被支持,尽管它们不在官方标准中。这些头文件的内容不在名字空间std 中。
新的C++头文件如<iostream>包含的基本功能和对应的旧头文件相同,但头文件的内容在名字空间std 中。(在标准化的过程中,库中有些部分的细节被修改了,所以旧头文件和新头文件中的实体不一定完全对应。)
标准C 头文件如<stdio.h>继续被支持。头文件的内容不在std 中。
具有C 库功能的新C++头文件具有如<cstdio>这样的名字。它们提供的内容和相应的旧C 头文件相同,只是内容在std 中。
所有这些初看有点怪,但不难习惯它。最大的挑战是把字符串头文件理清楚:
<string.h>是旧的C 头文件,对应的是基于char*的字符串处理函数;
<cstring>是对应于旧C 头文件的std 版本;
<string>是包装了std 的C++头文件,对应的是新的string 类。