Arithmetic and Data Types - Floating Point Numbers

Chapter chap4 section 2

The floating point data type provides the means to store and manipulate numbers with fractional parts and a very large range of sizes. The ANSI standard describes three types of floating point storage known as float , double and long double . Different C compilers running on different computer systems are allowed by the ANSI standard to implement the various types of floating point numbers in different ways but certain minimum standards must be met. The basic characteristics are summarised in the following table.

Type Maximum Value Significant Digits Context
float 1.0 X 10³⁷ 6 ANSI specified minimum acceptable
double 1.0 X 10³⁷ 10
long double 1.0 X 10³⁷ 10
float 3.403 X 10³⁸ 6 Actual characteristics on SUN Sparc station
double 1.798 X 10³⁰⁸ 15
long double 1.798 X 10³⁰⁸ 15
float 3.4 X 10³⁸ 7 Actual characteristics on a PC using the Turbo compiler
double 1.7 X 10³⁰⁸ 15
long double 1.1 X 10⁴⁹³² 19

Type	Maximum Value	Significant Digits	Context
float	1.0 X 10³⁷	6	ANSI specified minimum acceptable
double	1.0 X 10³⁷	10
long double	1.0 X 10³⁷	10
float	3.403 X 10³⁸	6	Actual characteristics on SUN Sparc station
double	1.798 X 10³⁰⁸	15
long double	1.798 X 10³⁰⁸	15
float	3.4 X 10³⁸	7	Actual characteristics on a PC using the Turbo compiler
double	1.7 X 10³⁰⁸	15
long double	1.1 X 10⁴⁹³²	19

For example a double variable on the SUN Sparc Station ANSI compiler will store numbers up to 1.798 X 10³⁰⁸ ( that's 308 zeroes ) to an accuracy of about 15 decimal places.

It will be noted that double and long double are the same on the SUN Sparc station, this is clearly allowed by the standard which equally clearly allows long double to support a larger maximum value and more significant digits if the compiler writer so wishes and the underlying hardware can manipulate such numbers. It should be noted that most C programmers tend to use the double floating point data type rather than float or long double for largely historical reasons.

Memory locations of any of the floating point data types can be declared by giving the type name and a list of identifiers. Floating point locations can be initialised as part of the declaration. There are no special rules for naming floating point data locations. Declarations for memory locations of different data types must be separate declarations but as many memory locations of a single type as required can be declared in a single declaration. The names of memory locations of all the various types must be distinct.

The values of floating point numbers can be written using the conventional notation involving a decimal point. A notation such as 3.7 implies a constant of type double. In the unlikely circumstances that a constant of a particular type is needed then one of the letters "f" or "F" for a float constant or "l" or "L" for a long double constant can be written as the last character of the constant. There are other notations which will be described later.

Floating point numbers can be converted to external form using the printf() function with the following conversion specifications

f	float
lf	double
Lf	long double

Between the % that introduces the conversion specification in the printf() format string and the f , lf or Lf that terminates the conversion there will usually be a field specification of the form

w.d

where w specifies the overall field width and d is the precision specification which tells printf() how many digits to print after the decimal point. If the precision is not specified then a default of 6 is used. For the use of precision specifications with integers see later. The technique for adjusting the field width and precision while the program is running is also discussed later.

The following program illustrates the declaration, initialisation and output of floating point numbers.

main()
{
	double	x=213.5671435;
	double	y=0.000007234;
	printf("x = %10.5lf\n",x);
	printf("y = %10.5lf\n",y);
	printf("x = %5.2lf\n",x);
	printf("y = %10lf\n",y);
	printf("x = %3.1lf\n",x);
}

It produced the following output.

x =  213.56714
y =    0.00001
x = 213.57
y =   0.000007
x = 213.6

There are several interesting points to notice here. On the second, third and fifth lines notice that the output has been rounded. On the fourth line note the default precision and on the fifth line note the output field width has expanded to accommodate the actual data.

It is, of course, quite admissible to omit the field width entirely and just quote a precision with preceding period, then the required number of digits will be displayed and the field width will expand suitably.

All the arithmetic operations described in the previous chapter with the exception of those involving the modulo operator ("%") may be applied to floating point numbers. The division operator applied to floating point numbers yields a floating point quotient, there are no complications with remainders or truncation. The effect of applying any of the operators to a mixture of floating point data types or a mixture of floating point and integer data types will be discussed later. There are no extra arithmetic operators for floating point data types, in particular there is no operator for raising a floating point number to a power or taking its square root. The ANSI standard defines library functions for these and many other common mathematical functions such as sines, cosines etc.

Floating point numbers may be read in using the scanf() library function in exactly the same way as integers were read in, only you need to use the appropriate floating point conversion specification. Any normal way of writing a floating point value may be used externally including integers which are properly converted to the equivalent floating point number.

Floating point arithmetic is illustrated by the following program.

main()
{
	double  data;
	double  x=490;
	data = (2.0*x)/3.5;
	printf("data = %20.10lf\n",data);
	x = 1.0/data;
	printf("   x = %20.10lf\n",x);
}

which produced the output

data =       280.0000000000
   x =         0.0035714286

And finally the following program called fp3 illustrates floating point input.

main()
{
	double	x,y;
	printf("Enter values for x and y ");
	scanf("%lf%lf",&x,&y);
	printf("The sum of x and y is %10.5lf\n",x+y);
}

which proceeded as follows

$ fp3
Enter values for x and y 234.567 987.654321
The sum of x and y is 1222.22132
$ fp3
Enter values for x and y 1 2
The sum of x and y is    3.00000

floating point data type mismatch in printf()
display of floating point numbers
Accuracy of floating point arithmetic
integer data types