Besides the desire for backwards compatibility, C++ also maintained the philosophy that performance was critical. Many design decisions of C++ were done to allow programmers to get maximum performance out of their programs at the cost of making the code more difficult to understand and more error-prone.
Java was originally developed as a programming language for programs embedded in electronic devices, such as microwave ovens, CD players, telephones, etc. Since this was a new marketplace, there was not a lot of legacy code that the Java designers needed to maintain compatibility with. Also, since the performance of architectures has improved substantially and the computing demands of these devices were not great, performance of the language was much less of a concern, but productivity of the programmers was considered quite important.
The end result is that Java is much more programmer-friendly than C++. James Gosling, the creator of Java, quips that Java is C++ without the guns, knives, and clubs. Fortunately for us, Nachos uses a restricted subset of C++ that excludes many (but not all) of the hazardous features. This document should give you a basic understanding of the subset of C++ that Nachos uses. This should help you read the existing Nachos code. You will certainly need to supplement this with a C++ text to get all the details. This document assumes that you already know Java.
The boolean data type consisting of the values true and
false is called bool in C++.
In C++, the data type char is an 8 bit value capable of
representing an ASCII character. An instance of the char type can
also be treated as an 8 bit integer. Therefore, you can do arithmetic on
variables declared as char.
In C++, you can declare any of the numeric data types to be
unsigned. This results in the lower bound for the type being 0. For
example, int normally ranges from -231 to
231. An unsigned int ranges from 0 to 232.
In C++, = is the assignment operator just as in Java. In C++,
= can also be used as an expression that returns the value being
assigned. This allows the following convenient way of initializing two variables
to the same value: a = b = 1;
1 is assigned to b. The assignment
expression returns the value assigned and assigns this to a.
C++ allows the condition controlling an if-statement or while-statement to be
an integer expression. If the integer expression evaluates to 0, this is treated
the same as the boolean value false. A non-zero value is treated
the same as true. This is bad programming style and should be
avoided, but you might run into this in the existing Nachos code. It would be
better to say: then to just say if (someInt != 0)
although they are semantically equivalent.
if (someInt)
Combining these last two paragraphs shows one of the most common syntactic
errors in C++ programs: Assume that the programmer intended to say int a = 0, b = 1;
if (a = b) {
...
}
a == b as
the condition, which is almost certainly the case. The programmer therefore
intended that the body of the if-statement would be executed only if
a and b had the same value. In Java, the code above
would give a compilation error, but it does not in C++ since = is
an expression. Instead the value of b is assigned to
a, so a now becomes 1. This value
(1) is returned as the result of the assignment expression. Since
integer expressions are allowed for conditions, this is ok. 1 is
treated as true and the body of the if-statement is executed, which is not what
the programmer intended. Be on the lookout for this simple error in your code!
In C++, you must declare the size of an array when you declare the array.
Memory is allocated for the array when the array is declared: Beware! C++ does not check array bounds like Java does. If
you pass in a negative number for an array bound or an array bound that is
greater than the size of the array, C++ will happily access some (seemingly
random) piece of memory. If this appears on the left side of an assignment
statement, it will happily change some (seemingly random) piece of memory.
Always check array bounds yourself if you are not absolutely certain that the
value is in the correct range!!!
int intArray[10];
In C++, it is possible to declare that a variable should be kept in a
register. This is typically done to improve performance but is really
unnecessary with modern compilers. Compilers are smart about recognizing which
variables are used most often and keeping those values in registers. If not used
extremely carefully, register declarations actually degrade performance. You
should avoid using them, but the existing Nachos code does use them so you need
to recognize what they are. register int i;
public,
private, or protected to the class definition. If the
class is public (in the Java sense), you put its declaration in a separate file
whose name ends in .h, while you put the definitions in a file with
the same name but ending in .cc. A class declaration can be broken
into three sections: a public section, a protected section, and a private
section. Instead of attaching these keywords to each class member as in Java,
you put the member in the appropriate section of the declaration as follows:
class Pair {
public:
Pair (int x, int y); // The constructor
int getX();
int getY();
void setX (int newValue);
void setY (int newValue);
private:
int x;
int y;
};
Also, note that C++ requires a semicolon at the end of a class declaration.
To make this class visible to another file, it must be included in that file:
There is no notion of a package in C++.
#include <pair.h>
The member definitions appear in the .cc file as mentioned earlier. Since
they appear outside the class declaration, each definition needs to declare
which class it is in using the class_name:: syntax: Any members that are fully defined in the class declaration
(usually just the variables) should not be defined in the .cc file.
int Pair::getX() {
return x;
}
If you want to specify a particular member from a specific class when doing a
function call, for example, you use the :: operator, as in SomeClass::someMethod()
The syntax for declaring a subclass is different in C++: This is a declaration of class SubClass : public SuperClass {
...
};
SubClass as a subclass of
SuperClass. Note that you must include the keyword
public before the superclass name. If you want to allow a method to
be overridden in a subclass, the superclass must include the keyword
virtual in its declaration of the method. Any class that contains a
virtual function and no definition of that function is implicitly abstract. It
is not possible to declare a class or function to be abstract. There is no
equivalent to Java's super keyword. If you want to refer to a
superclass function that is overridden in a subclass, you must explicitly
qualify the function name with the class from which it is inherited using the ::
syntax. Nachos does not use subclasses, so I will not go further into their
details.
printf to produce output. printf takes a variable
number of arguments. The first argument is a string. The string may have
embedded within it zero or more control sequences. For each control sequence
there must be an additional parameter to printf defining the value to use for
that control sequence. The control sequences used in Nachos are the following:
| %c | A character. |
| %s | String |
| %d | An integer to be represented as a decimal string. |
| %x | An integer to be represented as a hexadecimal string. |
| %f | A floating point number. |
printf is fprintf. fprintf is like
printf, except that it takes an additional argument before the
string which is a file pointer (for an already-open file). The output is written
to the file instead of to standard output. Here is an example use of
printf: char month [4];
int year;
strcpy (month, "Sep"); // Assigns a value to a string variable.
year = 1998;
printf ("The month is %s. The year is %d.\n", month, year);
scanf is the input function from C. Its format is similar to
printf. The first argument is a string often exclusively consisting
of control sequences and whitespace. For each control sequence, there must be an
argument that is the address
of a variable of the appropriate type. The memory
must be allocated already. sscanf is a variant of
scanf with an additional first argument representing a string to
parse instead of reading from standard input. sscanf is typically
used to convert a string to an integer. For example, suppose s
contains "1998", will set sscanf(s, "%d", &year);
year equal to the integer 1998.
The second (and preferred) way of doing input and output in C++ is using
streams. Nachos uses printf rather than streams so I will not
discuss them here.
(There is also an sprintf function that places the formatted
output in a string and a fscanf function that reads input from a
file, but these are not used in Nachos.)
#define (not final) as in: #define MAX_SIZE 10
#define is actually a much more powerful macro
mechanism. Everywhere that the defined name appears within the scope, it is
replaced by the definition, which could be a complex expression. This is often
done to make a statement look like a function call but without having the
runtime overhead of making a procedure call. For example, here is a macro
defining minimum: #define min(a,b) (((a) < (b)) ? (a) : (b))
Later in the code, the programmer can say: min (i, 4)
and it is expanded by the preprocessor to the compiler to: (((i) < (4)) ? (i) : (4))
Since this is done by the preprocessor, the expression is inlined
rather than being executed as a function call.
#define is one example of a compiler
preprocessor command. This is a command that is executed by a preprocessor that
scans the code prior to compilation. The preprocessor is run automatically when
you run the compiler. Two other common directives in C++ are #ifdef
and #ifndef. #ifdef takes a variable name for its
condition. If that variable name is defined, it evaluates to true and its body
is included in the source code that is compiled. #ifndef is similar
but includes its body if the variable is not defined. Both may have
#else clauses. They both end with the delimiter
#endif. #ifdef HOST_SPARC
#include
#endif
This is how C++ programmers typically port programs between
architectures. Architecture-dependent code is placed inside #ifdef statements.
When the code is compiled, the appropriate variable is set for the architecture
allowing the correct code to be compiled in. Unlike normal if-statements, these
if-statements are evaluated at compilation time. The branch that is true at
compilation time is compiled into the program. Branches that are false are not
compiled in. The condition is not tested at runtime.
. syntax required to dereference them.
If a data declaration is to be global and shared between multiple files, it
is declared in one file and declared to be an extern in the other
files. We all know that global variables are bad, so we shouldn't do this,
however, there are some instances of this in Nachos. A function can also be
declared this way, but it is better style to put the function declaration in a
.h file and #include the .h file rather than extern
the function declaration. Occasionally you will see something like:
extern "C" {
<a list of declarations>
}
This indicates that the list of declarations has been declared as
pure C code, not within classes.
The main program for a C++ program is called main, but it is
declared externally to any class. Its signature is: void main (int argc, char **argv);
The first parameter is the number of arguments. The second parameter is an array of strings, each string containing one command-line argument.
struct: struct Date {
char *month;
int date;
int year;
};
struct Date someDate;
Typically, when declaring a type one gives the type a name. Oddly enough,
creates a type named struct Date. To give it a simple type name, a
slightly different syntax is required: typedef struct {
char *month;
int date;
int year;
} Date;
Date someDate;
You will find quite a few struct declarations in existing Nachos code. You should not create any new struct declarations. Instead you should create a class whose data members are like the struct fields.
enum PrimaryColor {
red,
yellow,
blue
};
PrimaryColor c;
c = red;
Enumerated types are simulated in Java in the following way. The programmer declares a class (equivalent to the enumerated type) that provides a number of public constants (representing the enumerated values).
union String_or_int {
char *someString;
int someInt;
};
The union itself does not keep track of which type is in it, so a
union is typically used inside a struct where a second field of the struct
remembers the type currently in the union field: struct S_or_i {
bool containsInt;
union String_or_int x;
};
int are values. In C++, using a
type name always means that the variable will have a value of that type. It is
possible to introduce pointers to values explicitly and also to create types
whose values are pointers to other values. Suppose we have a Date
type, here is how we would declare a variable that is to contain a pointer to a
date and also a type to represent a pointer to a date: Date *someDate; // Variable containing a pointer to a Date
typedef Date *DatePtr; // Type defining a pointer to a Date
DatePtr date2; // Variable containing a pointer to a Date
date2 = someDate;
someDate and date2 both contain pointers
to dates. The assignment statement results in both variables pointing to the
same memory location and therefore sharing the same value as happens in Java.
Contrast the above with the following similar code that does not use
pointers: Assuming that Date someDate;
Date date2;
date2 = someDate;
Date is simply a struct type, not a
pointer type, the assignment statement above copies the value from
someDate to date2. If the value referenced in either
variable is changed, it has no effect on the other value. In Java, you would
need to explicitly clone the value to have this effect. Unless you know the
definition of the type involved in an assignment, you cannot tell whether the
assignment results in value-sharing or value-copying.
A pointer is dereferenced using the -> syntax: In C++, some_pointer->some_field
this is a pointer, not an object. To
dereference it, you must say this->member
To get a pointer to an object, you use the & operator:
int *IntPtr;
int anInt;
anInt = 1;
intPtr = &anInt;
To get the value pointed to by a pointer, you use the *
operator: int *intPtr;
int anInt, int2;
anInt = 1;
intPtr = &anInt;
int2 = *anInt;
In C++, all parameters are passed by value. If you want to be able to change
the value of a parameter as a side effect, you must declare the parameter type
to be a pointer and you must pass in the address of the variable that you want
to change: void increment (int * anInt) {
(*anInt)++;
}
int i = 0;
increment (&i);
If you want to pass a pointer or an array to a function, but you do not want
the object to be changed, you can say that the parameter type is
const: void doSomething (const int * anInt) {...}
int *intArray;
// Assume the array has been allocated and given memory.
doSomething (intArray);
In C++, objects (and structs) can either be automatic or manually allocated.
Variables whose types are not pointers (such as classes or structs) are
automatically allocated and deallocated. They are allocated when they are
declared and deallocated at the end of the block in which they are declared. You
do not use With pointer types, the programmer must explicitly allocate and deallocate
memory. You must allocate memory before assigning a value to the object.
Allocation for classes is done with If you want to do anything special when deleting an object, you must define a
deconstructor for the object's class. A typical thing to do is to delete the
objects referenced by the object being deleted (if you are sure they are the
last reference to that object!). The syntax for declaring a deconstructor is:
You must also allocate memory for pointers to other types (typically
structs), as in C. To do this you need to know how big the structure is. You can
find this out using the
There is no string data type in C++. Strings are simply arrays of characters.
Since we typically want to allow variable length strings, string variables are
typically declared to be pointers to characters: In a few places in Nachos, you will syntax like the
following:
There is no equivalent of JavaDoc for C++.
C++ does not have an C++ does not have interfaces.
C++ does not come with a large standardized class library as Java does. If
you look at sections 2 and 3 of the Unix manual (using Memory Management
Java is a garbage-collected language. C++ is not. In
Java, memory is allocated for an object when that object is constructed. The
memory is deallocated when there are no more references to that object.
new to allocate a variable whose type is a class.
new as in Java. You should
deallocate an object when you believe there are no more references to that
object. The syntax for deallocating an object is: where delete list1;
list1 is the name of an object. If you have an
array of objects, and you want to delete all the objects in the array, say
For every object allocated with delete [] objArray;
new there
should be a deallocation with delete.
where ~MyClass();
MyClass is the name of the class containing the
deconstructor.
sizeof function: The syntax for allocating memory is: sizeof (some_type)
To free such memory, you use the some_struct = (some_struct_type *) malloc (sizeof (some_struct_type));
free function: For every variable allocated with free (some_struct);
malloc there
should be a deallocation with free.
Similarity between Arrays and Pointers
Suppose you want to have an array
variable, but you do not know how big the array should be. Since C++ requires
you to declare the size of the array when you declare the array variable, you
cannot declare it to be an array. Instead you must declare a pointer to the
desired element type and later allocate the appropriate amount of memory
yourself: Even though you declared the variable to be a pointer, you can
still dereference it as an array!
int *intArray;
intArray = (int *) calloc (10, sizeof (int));
All strings must have the special null character '\0' as their last
character to identify the end of the string since there is no length recorded
with the string. String constants are enclosed in "" and implicitly end in the
null character. Since strings are pointers, assigning one string variable to
another results in the two variables pointing to the same piece of memory.
Changing one string changes the other. If you do not want this effect, you must
use the strcpy function: char *someString;
Remember to free the string when it is no longer being
used.
char *month; // Declare the string
month = malloc (4); // Allocate memory for string ending in null.
strcpy (month, "Sep"); // Assigns a value to a string variable.
This is a pointer to a pointer of a character. Keeping in mind that
pointer declarations often mean variably-sized arrays, this syntax represents a
variably-sized array of strings (since char **stringArray;
char * means string).
Features Unique to Java
There is no equivalent to the
synchronized keyword. Threads are also not built into the language.
When we discuss threads and synchronization in class, we will discuss how this
is done in C++.
instanceof operator.
xman),
however, you will see a large collection of C functions that can be called from
C++. Some of the common functions, like strcpy, are the same across
all operating systems, but, in general, the routines in these libraries are not
standardized (particularly the system calls in section 2). You need to explore
the man pages to see what is on the particular operating system you are using.
This lack of standardization is one reason that C++ programs are not as portable
as Java ones.
Last modified by Barbara
Lerner on September 11, 1998.