CSE 532 Theory of Database System
Spring 2005 Project Page
Announcements:
-
Hello
all, thank you(some of you)
for the emails with kind wishes. The same to you! However, the record for
ALL grade(incluing hw1~6, the Midterm, the Project, the Final which is graded
by Professor Gupta, and the accumulative total score for the whole semester!)
had already been sent to Prof. Gupta. I considered your hw6(if you did it)
for your accumulative ONLY IF it would help you to get a higher total score.
So please stop worrying about your not so high hw6 score. Furthermore, I
don't have more control on the record now, and I do not have the permission
to disclose the final exam grade or the accumulative total. I have not touched
the final paper at all and do not have the graded final or the score. Welcome
to send me any email if you have further concern, except for asking me to
tell you the final score or show you the graded final. Well, it is really
nice to stay with you this semester:-) Wish all of you have fun in the coming
Summer! And good luck with your bright future!
-
Hi,
folks, the record for ALL grade(incluing hw1~6, Midterm and the Project)
can be downloaded from here.
Please send me email as soon as possible, if you have any question about
that. Good luck with your Final! Have fun in the coming Summer!
-
Some students did the
project alone. I fully understand that you
need to work more than the person in a group of two. However, as what was
said in the begining of semester(and you still can check that on this webpage),
"You can choose one partner to do the project-I jointly. You can also
choose to work alone, but this will not reduce your load of work. Your work
will still be evaluated as a group work". So I cannot give you extra
credits for that. That is not fair to the group workers. You might talk
to the professor to see whether you can get something from him.
-
Some students sent emails
to ask for the printed copy of XMLSpy(or
eXsit) tutorial/help document.
I do not have printed
copy. If you really want, you can print them out in computer science graduate
lab 1239. If you still do not know the combo, please send email to me.
-
I received tons of emails during the
spring break for the help on the project-I. Some students just began to
work. I can understand that, since I am also not an early bird. However
I think that, sometimes, the project will take you much more time when you
work under pressure than you are more relaxed. Please try to start to work
on project-II as soon as it is posted out, and do not wait until the last
minute.
- [Posted on March 25] Statistic about Homework-2: Maybe most
of you have already known the good news. Average is 96.90/100, 36 students
got 100/100.
-
[Posted on March 13] Workload of the project-I
has been reduced by Dr. Gupta. You only need to submit the following files
to me. Should you have any questions about the description please do not
hesitate to send an email to me. You only need to submit the following to
zhiquan.gao AT gmail DOT com :
. XML document corresponding to the data.
. XML Schema (including the integrity constraints) with comments.
. XQuery queries (in editable format) with comments.
. Group work load partition.
-
Due to HW 3 and Midterm, the professor
pushes the project-I due date to 3/29 (after spring recess). Furthermore,
my office hour has already been changed to every Wednesday, 10AM-12PM, CSB
1208. [posted on March 7]
-
How to submit Project-I [posted on March 3]:
Send me the following files to the email
address here (zhiquan.gao
AT gmail DOT com). If you can, please zip
them into a small file.
-
A detailed project document (including
the Entity-Relationship
(ER) design and the diagram, a
clear description of the database scheme, description
of integrity constraints, XQuery statements for each query listed and the
executed results in eXist DBMS, and your group work load partition).
-
XML
Schema file and XML
file.
-
Queries
code in an editable format (.txt or .doc).
-
Some students came to ask questions about
Homework 1. Yes, you are required to write SQL for problem #3 according
to the description of the problem. I gave 0 credits for those students who
only wrote down relational algebra instead of SQL. However, later I knew
that Dr.Gupta permited to use relational algebra for this question. So I
regraded the homework again. The students who only wrote down relational
algebra instead of SQL will still get full credits for this problem. However
some students' paper had already been sent out, so if you fall into this
category and still get 0 credit for this problem please do come with your
homework and I will regrade your paper. Another simple way is, to make a
copy of your homework, place it into my mailbox near Dr.Gupta's office,
and send me an email. I will reply you once I change your grade record.
-
Statistic about Homework-1 [posted on March 3]:
Some students came to ask for the statistic about homework-1.
In case if you are interested, here is the data: average is 76.69/100, 14
students got 100/100.
All
groups are reqired to submit the
following document before March, 20th.
1.
A detailed project document. The
document should include:
- The Entity-Relationship
(ER) design and the diagram of the complete project.
- A clear description
of the database scheme, including a discussion of your design decisions.
- Description of integrity
constraints.
- XQuery statements for
each query
listed and the executed results in eXist
DBMS.
- You must clearly state
who does which part of the job in your project document.
2.
XML Schema file and XML
file
3.
Queries code in an editable format (txt or doc, no pdf!)
1.
First based on project-data,
design schema carefully with constraints. You MUST include ALL information
in your database system. You can
write it in a simple editor, or you can use XMLSpy.
Then put all data into an XML file.
2.
Now try to use XMLSpy
to validate that your XML file conform to your schema.
3.
Write queries
required in the project, and queries for constraints that cannot be included
in XML schema file. Try these queries in eXist
DBMS to get correct results.
-
You
can choose one partner to do the project-I jointly. You can also choose
to work alone, but this will not reduce your load of work. Your work will
still be evaluated as a group work. There are pros and cons in working
with a partner. It can be a frustrating experience to find out that your
partner is piggy-backing on you. So, choose your partner carefully! We
will NOT mediate disputes. The division of work between partners must
be vertical, not horizontal. This means that the two partners must collaborate
on each part of the project and be on top of what the other partner is
doing. Each student must be aware of and understand the techniques used
by his/her partner. Your final document must clearly state who does what
part of the job.
-
QUERIES
You are required to implement the following queries in eXist DBMS. All these
queries must be implemented entirely in XQuery with its object-oriented
features.
1. Give me the telephone number of the theatre that had shown the largest
number of movies produced by FOX or Universal.
2. How long is the movie that had the largest audience in the first quarter
of 2004?
3. For any given actor A, list the actors who played in the same movie with
A, or in the same movie with an actor who played in the same movie with
A, or in a movie 3 steps removed, or ... recursively.
4. How many movies released in 2003 had a loss in 2004 (i.e., the box office
for 2004 were below the cost of the movie)?
5. Which directors have the largest number of Cesars or Oscars award movies?
6. For each year list the movies with the second highest box office for
that year.
7. Which actor played in the largest number of movies where we count only
the movies whose box office were within 10% of the highest box office in
the year the movie came out?
8. In which years the Oscars were given to the largest number of male actors
aged 35 and more?
9. Which actor or actress played in the second largest number of movies
produced by FOX or Dreamworks between 2002 and 2004?
10. List directors such that the total number of Oscars-winning (not necessary
to win Oscar for this movie) actors playing in this director’s movies
is the second largest.
-
In project-I, you are asked to design
and implement an object-relational
database system application for storing information about movies and local
theatre shows. A similar but much more complicated one is here.
The data you use in this project is here.
You MUST include ALL information from the datapage
in your database system. Besides, you are required to use the eXist DBMS,
which supports the XQuery language. You need to create your data using an
editor in plain XML files and then store them in eXist and do your queries
in eXist. Make sure that you design XML Schema appropriate for this application
and include them in your design document. With XML you can use full object-oriented
design. Your XML files must conform to your schema and you must try to validate
it using one of the available validators, such as http://xml.apache.org/xerces-j/
or http://www.w3.org/2001/03/webdata/xsv
Try to make as many of the constraints as possible be part of your XML
schema documents. The rest should be expressed as XQuery queries so that
the constraint will be considered to be violated if the corresponding query
returns a non-empty answer. (This approach is analogous to assertions in
SQL). Implement and test the above constraints as well all the queries.
The due of Project-I is MARCH, 20th.
-
Please
read XMLSpy Tutorial (pdf,
chm)
and XMLSpy
User-Reference
during this weekend (Feb. 19th and
20th). They
will help you get familiar with the software as quickly as possible.
XMLSpy
-- an XML development environment:
XMLSpy
can be downloaded from http://www.altova.com/.
You can choose to download a Professional-Edition,
Home-Edition
or even Enterprise-Edition.
Home Edition is FREE, and it is enough for our project. For other editions,
new customers can request a free 30-day evaluation key-code from within the
application.
In
this project, you can use XMLSpy to:
- Write
XML schema: There are very convenient graphic
tool in XMLSpy.
- Write
XML file.
- Validate
XML file according to schema.
Please read XMLSpy Tutorial (pdf,
chm) and
XMLSpy
User-Reference
during this weekend.
They will help you get familiar with the software as quickly
as possible.
eXist
-- an XML database that supports XQuery and other things:
Information
for eXist can be found here: http://exist.sourceforge.net
. And there is a Quick
Start Guide.
1.
Download and install: If you want to
download it and install it on your own computer, stable version of installer
eXist-1.0b2.jar
is recommended. After you download it, simply click on the jar file icon, or
run it with java –jar eXist-1.0b2.jar. After installation, you can easily
access help documentations on your computer. In
graduate lab, eXist can also be installed in your H drive. Remember to make
sure that you have enough space left before installation.
2.
Load XML files and execute Xqueries:
have a look at Quick
Start Guide and XQuery
Docs.Notice
that eXist doesn’t support query on document fragments constructed within
the XQuery, i.e. a query like the following will fail:
let $x := <test>Test</test> return
$x/test.
You
have to store the fragment into the database first. Look at Manipulating
Database Contents for
example. You can use the provided xmldb extension
functions for that.
Frequently
Asked Questions:
1)
For query 2, what is the definition of "largest audience"?
Highest box-office.
2) When you mention "actor",
such as query 3,do you mean male actors or both male and female actors? same
question for directors, do you mean both male and female directors, unless otherwise
specified?
Actor ONLY means male performer
here while actress only means female performer. However, director has no specific
meaning of male or female and should include both.
3) When you mention "box
office", such as query 6, do you mean combined box office for the same
movie?
Yes.
4) Should we assume that the
names of actors and directors is always CAPITALIZED and is a single string?
Same question for the company name FOX.
I won't be strict at whether
you use capital or not. Both are okey once you return the correct result of
your queries. But do NOT use "actor1", "director2" and "company3",
etc. I want the names. Whether or not to use a single string is decided all
by yourself in the schema design.
5) Does awards have a special
designation, such as best actor/director, or each person affiliated with an
award winning movie is given that award? For example if movie M wins an award
A, does each actor and director affiliated with M get A?
No. Movie wins award for itself.
Actor/Director wins award for themselves. They do not have to be affiliated
with each other. A movie can win award even only for its music or costume while
the director or actor(s) is/are not good at all.
6) Should we assume that no
two movies can have the same name?
Sure no two movies have the same name in this
project. totally 12 movies in our project.
7) In Homework#2 question 3,
for EDB GraphEdges, if theres an edge between nodes n1 & n2 in a graph G1,
then does it mean that both entries (G1, n1, n2) AND (G1, n2, n1) are present
?
NO. Because here assume that
the given graphs are DAG, all edges are DIRECTED.
8) How to Create a Zip File
- On a unix or linux system you can zip files using the zip
command in a terminal:
- zip cse532yourSBID.zip file1 file2 file3 ..etc.
- When using Windows, you will need a tool like WinZIP - if
you do not have a copy of WinZIP you can obtain one from http://www.winzip.com/
To create the zip file containing the files to submit:
- Make sure you have WinZIP installed.
- Locate the files you want to submit using Windows Explorer.
- Select the files using Ctrl+left mouse button.
- With the mouse over one of the selected files, from the
popup menu select the WinZIP -> Add files to zip... option.
- Enter an appropriate name for the zip file in the requester
that pops up. Try to use "cse532your SBID" as your zip file name.
- Okay the requester, this should start the packing process.
- Submit the
zip file this operation creates.
Useful
links: