next up previous
Next: About this document Up: My Home Page

Results of Midterm 2

Scores were slightly better than on the first midterm.

High - 95 (3 of them!)

Low - 50

Average - 78.9

90s - 12

80s - 37

70s - 32

60s - 11

50s - 9

I have also graded Project 2, and was in general quite pleased with them. Most people received all or almost all credit.

Check your updated scores on the course WWW page. In particular, discuss with your TA if your laboratory exercises to date have not been recorded correctly.

``Baseball is the very symbol, the outward and visible expression of the drive and push and rush and struggle of the raging, tearing, booming, nineteenth century.'' - Mark Twain

Pizza with the Prof, 6PM tonight - one or two spots remain.

Data Management

Thus far, we have seen several types of software which can be used to enter, modify, and retrieve data.

Word processors enable us to enter and manipulate textual data. Spreadsheets enable us to enter and manipulate numerical data. The WWW enables us to make data files public for other people to use.

However, none of these systems are suitable when we have large amounts of data to maintain and share with other users.

In the university such needs include maintaining grade and employee records. Hospitals must maintain patient records Stores must maintain customer and inventory data.

The needs for reliable, convenient access to data makes the problem different from what we have seen.

Data Accuracy or Integrity

In important applications, such as credit bureaus and grade records, accurate data entry is essential.

Even accurate typists make errors once every few hundred keystrokes, and such errors are not easily detected by proofreading (example: consider checking columns of numbers).

Designing a data entry system to detect and minimize errors is critical to ensure the quality of the data.

Sanity checking should be performed on each entered number. If anything other than "M" or "F" is typed to enter sex, this should be flagged immediately.

More difficult is consistency checking, making sure that all the fields make sense. Is the zip code consistant with the area code consistant with the address?

What steps are taken to ensure the accuracy of recorded grade information? What about the status of your checking account?

In entering important financial data, certain services have two different people enter the same data independently and compare them.

Recall the saying "Garbage in, Garbage out".

Data Security

Many databases contain important data, which must be kept confidential or protected.

Depending upon the application, there may be many different types of users, each of whom needs access to different sections of the data base.

At the credit bureau, data entry people need to be able to modify the database by adding new records, but forbidden from reading existing records.

Clients of the bureau must have read access to certain people, but cannot modify the database

Ordinary people must have access to their own data to correct it, but not to read other people.

Such security must be maintained using passwords, but is often difficult to allow exactly the access one needs.

Security concerns are why people fear big centralized databases.

Another class of security concerns revolve around backups. Keeping a copy of yesterday's database is not enough in many critical applications. What if you make a bank deposit and suddenly the computer goes down? How can you be sure you didn't lose your money?

Maintainence

In most interesting applications, the contents of the database is always changing.

We must have the capability of inserting new records, deleting old records, and modifying the contents of existing records.

These problems are particularly difficult in a world where many people have access to the database at once. How do we prevent one person from deleting a record someone else needs?

Think about the national airline reservation system. Literally thousands of travel agents are working on the same system simultaneously. How can you avoid two people in two different parts of the country from both getting the last seat on a plane?

Files, Records, and Fields

In a large database, all of the data is stored on disks in files.

The relevant parts of the data can copied into main memory to work with.

Data in a database file is typically organized into records, such that each record has the same format. For example, a file of customer records might contain thousands of people, each represented by the same data fields: name, SSN, date, amount, ...

Database Management Systems

Database managements systems (DBMS) are the software which manages access to a database.

Important commercial DBMS include Oracle, Sybase, dBASE, and Microsoft Access.

Dealing with raw files is difficult and ugly. DBMS differ according to the model of data that they present to the user.

The hierarchical and network database models explicitly organize the data items according to a given structure. This makes many access operations fast, although constructing the relationships is unnatural for many applications.

The relational database model stores information in tables, and then operates on the tables to extract the desired information. Most modern databases are relational databases because of its freedom.




next up previous
Next: About this document Up: My Home Page

Steve Skiena
Wed Nov 13 14:28:38 EST 1996