XMUT103
Assignment 4: Analysing and Measuring Sets, Binary Trees, Sorting and lambdas
Due 11 May , 19 pm
Download zip file of
necessary code and data and extract it to an Assig4 folder
(eg, in a COMP103 folder in your home folder).
It should contain templates for the Java programs that will be marked.
Important note about the style of your code.
To submit
- Report.pdf
- DecisionTree.java
- EarthquakeSorter.java and Earthquake.java
Part 1: Analysing and Measuring Sets (Weight: 15%)
We want to find the number of unique words within a given file of words. You can use three different types of collections to do this:ArrayList
, HashSet
and TreeSet
.
The VocabularyMeasure.java
file has an example code for measuring the cost of finding the number of unique words.
Read and understand the code, write a brief report (submit as Report.pdf) to: - Give your analysis of the costs from the following parts using O(...) notation.
- Draw a graph to show how efficient each of the collections (i.e., ArrayList, HashSet and TreeSet) is with an increasing number of words. The x-axis (the dependent variable) of the graph should be the number of words and the y-axis (the independent variable) should be time (in seconds).
- Document your conclusions, and how you arrived at these conclusions.
(a): ArrayList
If you used anArrayList
with n items, what would be the cost?
(Express the cost in big-O notation in terms of n).
Hint: For each line, work out the cost of doing that line once, and the number of times (worst case) that the line will be repeated. Then fill up the total cost line.
int count = 0; // cost= O( ) times= List<String> vocab = new ArrayList<String>(); // cost= O( ) times= try{ Scanner sc = new Scanner (Path.of(fname)); // cost= O( ) times= while (sc.hasNext()){ String word = sc.next(); // cost= O( ) times= count++; // cost= O( ) times= if(!vocab.contains(word)){ // cost= O( ) times= vocab.add(word); // cost= O( ) times= } } sc.close(); // cost= O( ) times= }catch(IOException e){UI.println("bad"+e);} UI.printf("Words = %,10d Vocab = %,d\n", count, vocab.size()); // Total Cost = O( )
(b): HashSet
If you used anHashSet
with n items, what would be the cost?
(Express the cost in big-O notation in terms of n).
int count = 0; // cost= O( ) times= Set<String> vocab = new HashSet<String>(); // cost= O( ) times= try{ Scanner sc = new Scanner (Path.of(fname)); // cost= O( ) times= while (sc.hasNext()){ String word = sc.next(); // cost= O( ) times= count++; // cost= O( ) times= vocab.add(word); // cost= O( ) times= } sc.close(); // cost= O( ) times= }catch(IOException e){UI.println("bad"+e);} UI.printf("Words = %,10d Vocab = %,d\n", count, vocab.size()); // Total Cost = O( )
(c): TreeSet
If you used aTreeSet
with n items, what would be the cost?
(Express the cost in big-O notation in terms of n).
int count = 0; // cost= O( ) times= Set<String> vocab = new TreeSet<String>(); // cost= O( ) times= try{ Scanner sc = new Scanner (Path.of(fname)); // cost= O( ) times= while (sc.hasNext()){ String word = sc.next(); // cost= O( ) times= count++; // cost= O( ) times= vocab.add(word); // cost= O( ) times= } sc.close(); // cost= O( ) times= }catch(IOException e){UI.println("bad"+e);} UI.printf("Words = %,10d Vocab = %,d\n", count, vocab.size()); // Total Cost = O( )
Part 2: Decision Trees (Binary Trees) (Weight: 60%)
A Decision Tree is a structure for getting to a decision by following through a tree of yes/no questions.- The root node of the tree is the starting point.
- Each internal node contains a question and two children:
- the Yes child is the subtree to follow if the answer to the question is yes,
- the No child is the subtree to follow if the answer to the question is no
- A leaf node of the tree contains the decision that the tree will give if the answers to the questions led to this point in the tree.
DTNode
class to represent the nodes in the
tree. DTNode.java
is already complete, and defines constructors,
getters, setters, and a draw
method.
The DecisionTree
class is the main program. The setupGUI
method is
written for you and sets up buttons for the methods that you need to
complete. The loadTree
method is also written for you and will
create an initial decision tree in the theTree
field.
You should complete the following methods
Core
-
printTree()
should print out the whole tree.printTree()
should be recursive - printing the text in a node, then (if it has children) printing the yes subtree and then the no subtree.
For example, thesample-animal-tree
should be printed as:
has whiskers? yes:bigger than person? yes:Tiger no:Cat no:has legs? yes:has trunk? yes:Elephant no:Rhino no:Snake
-
runTree()
should "run" the current decision tree by asking the user questions, following down the tree from the root to a leaf, according to the answers. When it gets to a leaf, it should print the answer.
For example, with the sample tree,runTree()
might generate the interaction:
Is it true: has whiskers (Y/N): yes Is it true: bigger than person (Y/N): no The answer is: Cat
-
growTree()
should play an interactive "game" with the user to grow the tree by adding new answers and new questions. For the sample tree, this would be adding new animals that the tree can classify and new questions to distinguish the new animals.
growTree()
should start the same way as runTree
by starting
at the root of the tree, and working down, asking questions,
until it gets to a leaf node.When it is at a leaf, it prints the decision and asks the user if the decision is correct. If the decision was wrong, it
-
- asks the user what the decision should have been,
- asks for a property or question to distinguish the right decision from the wrong one
- changes the text in the node to be the question
- adds two new children (leaf nodes) to the node with the two decisions.
Is it true: has whiskers (Y/N): yes Is it true: bigger than person (Y/N): no I think I know. Is it a Cat? no OK, what should the answer be? Dog Oh. I can't distinguish a Cat from a Dog Tell me something that's true for a Dog but not for a Cat? Property: barks Thank you! I've updated my decision tree.Printing the tree would now produce:
has whiskers? y:bigger than person? y:Tiger n:barks? y:Dog n:Cat n:has legs? y:has trunk? y:Elephant n:Rhino n:Snake
Completion
- Improve
printTree()
to print the decision tree using indentation and y/n to show the tree structure.
For example, thesample-animal-tree.txt
should be printed as:
has whiskers? y:bigger than person? y:Tiger n:Cat n:has legs? y:has trunk? y:Elephant n:Rhino n:Snake
-
saveTree()
should save the decision tree to a file in a way that it can be loaded back in (by theloadTree()
method). The method will need to be recursive.
Question: has whiskers Question: bigger than person Answer: Tiger Answer: Cat Question: has legs Question: has trunk Answer: Elephant Answer: Rhino Answer: Snake
Challenge:
-
drawTree()
should display the tree in the graphics window so that each node is drawn as a rectangle with text in it, and there are lines connecting parent nodes and their child nodes. See a simple example below.
- Yes/No questions are quite limited. Extend the program (including DTNode) to allow multiway questions, for example a node could contain the property: "the color of the stripes", which might lead to five different children, corresponding to the answers "black", "brown", "green", "yellow", and "orange". Further extend these methods:
-
growTree()
should be able to add a new answer to an existing question as well as add a new question. -
runTree()
-
printTree()
-
saveTree()
-
Part 3: Earthquake Sorter (25%)
Earthquake Sorter is a small tool for quickly sorting and analysing the data for earthquakes within New Zealand. The program is able to sort based on various properties of the earthquakes, including magnitude and depth, location, date and time, and ID. TheEarthquakeSorter
program has two classes: - The
EarthquakeSorter
class, which handles the user interaction, stores the list of earthquakes, and contains several methods for sorting the earthquakes according to various metrics.- The user interface has buttons for loading the earthquake data, as well as sorting it in various ways.
- The
loadData
method loads the earthquake data from a data file (two have been provided:earthquake-data.txt
andbigearthquake-data.txt
). - The
sortById
method sorts the list of earthquakes according to their IDs. - The
sortByMagnitude
method sorts the list of earthquakes according to their magnitude, largest first - The
sortByTime
method sorts the earthquakes by date and time, with earlier events occurring first in the list. - The
sortByRegion
method sorts the earthquakes according to the region they occurred in. Within a region, earthquakes are sorted by magnitude (highest first). If two earthquakes have the same region and magnitude, they are sorted by depth (more shallow first). - The
sortByProximity
method asks the user for a latitude and longitude, and then sorts the earthquakes by distance to that location (with closer earthquakes at the top of the list).
- The
Earthquake
class which represents information about a single earthquake.- The constructor in the
Earthquake
class requires a lot of parameters, and populates all of the relevant fields. - The
distanceTo
methods calculate the distance between this earthquake and another earthquake, or this earthquake and a specified latitude and longitude. - The
get
methods return the values of the fields. - The
toString
method provides a nicely formatted string for printing the earthquake. - The
compareTo
method provides the default ordering of earthquakes (by ID). (You need to complete this.)
- The constructor in the
Core
InEarthquake.java
: - Complete the
compareTo
method to provide a natural ordering based on the ID. You may want to see if theString
class has any methods you can use to help you here.
EarthquakeSorter.java
: - Complete the
loadData
method to load the data from a file and store it in the list of earthquakes.- Each line of the data file is one earthquake.
- The file format is: ID date time latitude longitude magnitude depth region
- Each line should be used to create an Earthquake object, which should be added to the list of earthquakes.
- Complete the
sortByID
method to sort the list of earthquakes using their natural ordering. - Complete the
sortByMagnitude
method to sort the list of earthquakes according to their magnitude (largest first).
Completion
- Complete the
sortByTime
method to sort the list of earthquakes according to the date and time that they occurred. - Complete the
sortByRegion
method inEarthquakeSorter.java
to sort the list of earthquakes according to region. If two earthquakes have the same region, they should be sorted by magnitude and then depth.
Challenge
- Complete the
sortByProximity
method inEarthquakeSorter.java
to sort the list of earthquakes according to their proximity (distance) to a given location (closest earthquakes first). The earthquakes should be printed out with their distance to the location.- Hint: You may find the
distanceTo
method inEarthquake.java
useful here.
Note that this is not a particularly challenging challenge!
- Hint: You may find the
bigearthquake-data.txt
file. Use these to check your code and make sure that it is outputting data in the right order. EG: -
output-ID.txt
listing by ID. -
output-Magnitude.txt
listing by Magnitude. -
output-Time.txt
listing by date and time -
output-Region.txt
listing by region, and magnitude then depth within region -
output-Proximity-ChCh.txt
listing by proximity to Christchurch (location -43.5321, 172.6362)
chch.txt
, the UI will automatically input the latitude and longitude data from that file. This may help with testing your program when working on the completion and challenge.