Addresses, Pointers, Arrays and Strings - Library string handling functions

Chapter chap6 section 15

The C programming language does not, in fact, support a string data type, however strings are so useful that there is an extensive set of library functions for manipulating strings. Three of the simplest functions are

NameFunction
strlen()determine length of string
strcmp()compare strings
strcpy()copy a string

The first of these, strlen(), is particularly straightforward. Its single parameter is the address of the start of the string and its value is the number of characters in the string excluding the terminating NUL.

The second function, strcmp(), takes the start addresses of the two strings as parameters and returns the value zero if the strings are equal. If the strings are unequal it returns a negative or positive value. The returned value is positive if the first string is greater than the second string and negative if it is less than. In this context the relative value of strings refers to their relative values as determined by the host computer character set (or collating sequence ).

It is important to realise that you cannot compare two strings by simply comparing their start addresses although this would be syntactically valid.

The third function, strcpy(), copies the string pointed to by the second parameter into the space pointed to by the first parameter. The entire string, including the terminating NUL, is copied and there is no check that the space indicated by the first parameter is big enough.

A simple example is in order. This program, stall3, has the opposite effect to the example given earlier.

main()
{
	char	*days[] = {
		"Sunday",
		"Monday",
		"Tuesday",
		"Wednesday",
		"Thursday",
		"Friday",
		"Saturday"
			};
	int	i;
	char	inbuf[128];
	printf("Enter the name of a day of the week ");
	gets(inbuf);
	do
	{
		if(strcmp(days[i++],inbuf)==0)
		{
			printf("day number %d\n",i);
			exit(0);
		}
	} while(i<7);
	printf("Unrecognised day name\n");
}
		
A typical dialogue
$ stall3
Enter the name of a day of the week Tuesday
day number 3
$ stall3
Enter the name of a day of the week Bloomsday
Unrecognised day name
$ stall3
Enter the name of a day of the week Friday
day number 6
$
The program is totally unforgiving of any errors in the input layout such as leading and trailing spaces or entry all in lower case or entry of abbreviations.

To demonstrate the use of strlen(), here is a simple program, called stall4, that reads in a string and prints it out reversed, a tremendously useful thing to do. The repeated operation of this program is terminated by the user entering a string of length zero, i.e. hitting the RETURN key immediately after the program prompt.

main()
{
	char	inbuf[128];	/* Hope it's big enough */
	int	slen;	/* holds length of string */
	while(1)
	{
		printf("Enter a string ");
		gets(inbuf);
		slen = strlen(inbuf);	/* find length */
		if(slen == 0) break;	/* termination condition */
		while(slen > 0)
		{
			slen--;
			printf("%c",*(inbuf+slen));
		}
		printf("\n");
	}
}

The program operates by printing the characters one by one, starting with the last non-NUL character of the string. Notice that "slen" will have been decremented before the output of the character, this is correct since the length returned by strlen() is the length excluding the NUL but the actual characters are aggregate members 0 .... length-1.

A typical dialogue is illustrated below.

$ stall4
Enter a string 1234
4321
Enter a string     x
x    
Enter a string abc def ghi
ihg fed cba
Enter a string 
$

Here is another version of the same program re-written using a more typical C programming style.

main()
{
	char	inbuf[128];	/* Hope it's big enough */
	int	slen;	/* holds length of string */
	while(1)
	{
		printf("Enter a string ");
		gets(inbuf);
		if((slen = strlen(inbuf)) == 0) break;
		while(slen--)printf("%c",*(inbuf+slen));
		printf("\n");
	}
}

It illustrates the use of side-effects and address arithmetic and should be compared with the first version.

The next prorgam is designed to drive home the point about comparing strings as distinct from comparing their start addresses.

main()
{
	char	x[22],*y;
	strcpy(x,"A Programming Example");
	y = x;

/*	First test - compare y with constant */

	if( y == "A Programming Example")
		printf("Equal 1\n");
	else
		printf("Unequal 1\n");

/*	Second test - compare using strcmp() */

	if(strcmp(x,"A Programming Example") == 0)
		printf("Equal 2\n");
	else
		printf("Unequal 2\n");

/*	Assign constant address and compare */

	y = "A Programming Example";
	if( y == "A Programming Example")
		printf("Equal 3\n");
	else
		printf("Unequal 3\n");
}
It produced the following output
Unequal 1
Equal 2
Unequal 3

The first comparison compares the address held in the variable "y" with the address of the system place where the string constant "A Programming Example" is stored. Clearly the start address of the aggregate "x" is different from the address of the system place where the string constant "A Programming Example" is stored, since strcpy() has only copied the string.

The second test used strcmp() to compare the strings rather than their start addresses, the result is, not surprisingly, that the strings were, in fact, equal.

The final test looks rather surprising. A value has been assigned to "y" and "y" has then been immediately compared with that value and found to be different. The explanation is that the compiler has not been clever enough to spot the repeated use of the same string constant and has made multiple copies of this constant in memory. This underlines the fact that the actual value of a string constant is the address of the first character. Some compilers may be clever enough to avoid this problem. The ANSI standard does not specify any particular behaviour.

Finally an example using strcpy(). This program, called stall5 twiddles the case of every character in the input string.

main()
{
	char	istr[128];	/* input buffer */
	char	tstr[128];	/* translated string here */
	int	i;
	int	slen;		/* string length */
	while(1)
	{
		printf("Enter a string ");
		gets(istr);
		if((slen=strlen(istr))==0) break;	/* terminate */
		strcpy(tstr,istr);	/* make a copy */
		i = 0;
		while(i < slen)	/* translate loop */
		{
			if(     tstr[i] >= 'A' && 
				tstr[i] <= 'Z') /* upper case */
					tstr[i] += 'a'-'A';
			else if(tstr[i] >= 'a' && 
				tstr[i] <= 'z')	/* lower case */
					tstr[i] += 'A'-'a';
			i++;	/* to next character */
		}
		printf("   Original string = %s\n",istr);
		printf("Transformed string = %s\n",tstr);
	}
}

The following dialogue is typical
$ stall5
Enter string aBDefgXYZ
   Original string = aBDefgXYZ
Transformed string = AbdEFGxyz
Enter string ab   CD   123
   Original string = ab   CD   123
Transformed string = AB   cd   123
Enter string :::x:::y:::Z:::
   Original string = :::x:::y:::Z:::
Transformed string = :::X:::Y:::z:::
Enter string 

The program has preserved the original string by copying it to a different memory area before manipulating it.

It is important that there is somewhere to copy the string to. A common programming error is illustrated below. This variation on the previous program is called stall6.

main()
{
	char	istr[128];
	char	*tstr;
	int	i;
	int	llen;
	while(1)
	{
		printf("Enter string ");
		gets(istr);
		if((llen=strlen(istr))==0) break;
		strcpy(tstr,istr);
		i = 0;
		do
		{
			if(tstr[i]>='A' && tstr[i]<='Z')
				tstr[i] += 'a'-'A';
			else if(tstr[i]>='a' && tstr[i]<='z')
				tstr[i] += 'A'-'a';
		} while(i++<=llen);
		printf("   Original string = %s\n",istr);
		printf("Transformed string = %s\n",tstr);
	}
}
This is what happened
$ stall6
Enter string abcdefghjikl
Segmentation fault (core dumped)
The programmer has probably assumed that there really is such a data type as a string and that strcpy() provides the facility to assign strings. The failure of the program is not surprising once you think about the initial value of "tstr". The initial value of non-initialiased variables was discussed earlier. Clearly copying the input character string to whatever location tstr happened to point to, has overwritten something important or has attempted to access a memory location not available to the program. Occassionally this error will not cause program failure because "tstr" happens to point to somewhere relatively safe and the program has only been tested with strings that were not long enough to cause damage when copied to whatever place "tstr" pointed to.


Exercises