These are the questions for the key/value and document stores.
+Instructions
+For the questions in this section, we will consider a document-oriented database with Yelp data. Imagine there are 3 collections: businesses, users and reviews.
+Please email the answer to jan.aerts@uhasselt.be. Your email should include the answers for each statement like this (obviously mock-up):
+1.1: false
+1.2: true
+1.3: true
+1.4: false
+2.1: true
+2.2: ...
+|
+ Note
+ |
++Explicitly state which are false and which are true. Do not just send a list of the true statements. + | +
Dataset
+{
+ "_key": "tnhfDv5Il8EaGSXZGiuQGg",
+ "_id": "businesses/tnhfDv5Il8EaGSXZGiuQGg",
+
+ // the business's name
+ "name": "Garaje",
+
+ // the city
+ "city": "San Francisco",
+
+ // 2 character state code
+ "state": "CA",
+
+ // star rating
+ "stars": 4.5,
+
+ // number of reviews
+ "review_count": 1198,
+
+ // object, business attributes to values. note: some attribute values might be objects
+ "attributes": {
+ "RestaurantsTakeOut": true,
+ "BusinessParking": {
+ "garage": false,
+ "street": true,
+ "lot": false
+ },
+ },
+
+ // business category: Restaurant, Plumber, ...
+ "category": "Restaurant"
+}
+{
+ "_key": "zdSx_SD6obEhz9VrW9uAWA",
+ "_id": "reviews/zdSx_SD6obEhz9VrW9uAWA",
+
+ // user id, maps to the user in users collection
+ "user_id": "users/Ha3iJu77CxlrFm-vQRs_8g",
+
+ // business id, maps to business in businesses collection
+ "business_id": "businesses/tnhfDv5Il8EaGSXZGiuQGg",
+
+ // star rating
+ "stars": 4,
+
+ // date of review
+ "date": {
+ "year": 2016,
+ "month": 3,
+ "day": 9
+ },
+
+ // number of useful votes received
+ "useful": 15,
+
+ // the review itself
+ "text": "Great place to hang out after work"
+}
+{
+ "_key": "Ha3iJu77CxlrFm-vQRs_8g",
+ "_id": "users/Ha3iJu77CxlrFm-vQRs_8g",
+
+ // the user's first name
+ "name": "Sebastien",
+
+ // the number of reviews they've written
+ "review_count": 56,
+
+ // when the user joined Yelp
+ "yelping_since": {
+ "year": 2011,
+ "month": 1,
+ "day": 1
+ },
+
+ // number of fans the user has
+ "fans": 1032,
+
+ // the years the user was elite
+ "elite": [
+ 2012,
+ 2013
+ ],
+
+ // average rating of all reviews
+ "average_stars": 4.31
+}
+We will make the following assumptions:
+-
+
-
+
All documents are well-formed, and therefore have the same schema. In other words: all keys are present in all documents (e.g.
+attributesis not missing from one of the businesses).
+ -
+
There are users who have written no reviews and there are businesses that have received no reviews.
+
+
Question 1
+Consider the following query:
+FOR r IN reviews
+COLLECT m=r.date.month AGGREGATE u=MAX(r.useful)
+LIMIT 5
+SORT u DESC
+RETURN {m:m, u:u}
+Which of the following statements are true? Attention: there might be none, there might be more than one.
+Possible answer 1.1 - This query shows the 5 months with the highest number of useful votes their reviews received.
+Possible answer 1.2 - This query shows the 5 most useful reviews.
+Possible answer 1.3 - This query will return a value for each month of the year, even if there are no reviews in that month.
+Possible answer 1.4 - This query shows 5 random months together with the highest number of useful votes a review in them received.
+Question 2
+Consider the following query:
+FOR u IN users
+FOR r IN reviews
+FILTER r.user_id == u._id
+FILTER r.stars < (u.average_stars/2)
+RETURN {n:u.name,us:u.average_stars,s:r.stars}
+Which of the following statements are true? Attention: there might be none, there might be more than one.
+Possible answer 2.1 - This query will only return results for users who have written reviews.
+Possible answer 2.2 - All users will appear in the results.
+Possible answer 2.3 - This query returns a result for each review where the user gives less than half of their average number of stars.
+Possible answer 2.4 - This query will return the same results if the first two lines were swapped (i.e. first FOR r IN reviews, then FOR u IN users).
Question 3
+Consider the following query:
+FOR u IN users
+SORT u.fans DESC
+LIMIT 1
+RETURN {a:u.name, b:u.average_stars}
+Which of the following statements are true? Attention: there might be none, there might be more than one.
+Possible answer 3.1 - The result is not deterministic because there might be multiple users with an equal amount of fans.
+Possible answer 3.2 - This query returns the name and average stars given for the user with the fewest fans.
+Possible answer 3.3 - This query returns the name and average stars given for the user with the most fans.
+Possible answer 3.4 - The result is independent of the maximum number of stars a user gave in their reviews.
+Question 4
+Consider the following query:
+FOR b IN businesses
+FILTER b.state == "CA"
+RETURN DISTINCT {
+ name: b.name,
+ stars: (
+ FOR r IN reviews
+ FILTER r.business_id == b._id
+ FILTER r.date.year == 2016
+ RETURN r.stars
+)}
+Which of the following statements are true? Attention: there might be none, there might be more than one.
+Possible answer 4.1 - This returns the name of each business in California, plus an array of the stars they received in 2016. If a business didn’t have a review in 2016, that business is not included in the output.
+Possible answer 4.2 - This returns the name of each business in California, plus an array of the stars they received in 2016. If a business didn’t have a review in 2016, an empty array is returned for the stars.
+Possible answer 4.3 - The DISTINCT has no effect on the output and could have been removed.
Possible answer 4.4 - The SORT r.stars DESC has no effect on the output and could be removed.
Question 5
+Which of the following queries returns the take-out restaurant with the highest number of reviews in 2018? The output should be a single object and look like this:
+{
+ "_key": "GBTPC53ZrG1ZBY3DT8Mbcw",
+ "_id": "businesses/GBTPC53ZrG1ZBY3DT8Mbcw",
+ "name": "Luke",
+ "city": "New Orleans",
+ "state": "LA",
+ "stars": 4,
+ "review_count": 4554,
+ "attributes": {
+ "RestaurantsReservations": "True",
+ "RestaurantsTakeOut": "True"
+ },
+ "category": "Restaurant"
+}
+Attention: there might be none, there might be more than one.
+Possible answer 5.1
+FOR a IN businesses
+FILTER a.attributes.RestaurantsTakeOut == "True" AND a.category == "Restaurant"
+SORT a.review_count DESC
+LIMIT 1
+RETURN a
+Possible answer 5.2
+LET a = (
+ FOR b IN reviews
+ FILTER b.date.year == 2018
+ COLLECT c = b.business_id WITH COUNT INTO cnt
+ SORT cnt DESC
+ LIMIT 1
+ RETURN DOCUMENT(c)
+)
+
+FOR d IN a
+FILTER d.attributes.RestaurantsTakeout == "True"
+FILTER d.category == "Restaurant"
+RETURN d
+Possible answer 5.3
+FOR r IN reviews
+FOR b IN businesses
+FILTER r.business_id == b._id
+FILTER r.date.year == 2018
+FILTER b.category == "Restaurant"
+FILTER b.attributes.RestaurantsTakeOut == "True"
+COLLECT c = r.business_id WITH COUNT INTO d
+SORT d DESC
+LIMIT 1
+RETURN DOCUMENT(c)
+Possible answer 5.4
+FOR b IN businesses
+FILTER b.attributes.RestaurantsTakeOut == "True"
+FILTER b.category == "Restaurant"
+SORT b.review_count DESC
+LIMIT 1
+RETURN b.name
+Question 6
+Which of the following queries results in a list of unique business categories? It would look like this:
+["Restaurant","Plumber","Beauty & Spas","Gunsmith","Wedding Planner"]
+Attention: there might be none, there might be more than one.
+Possible answer 6.1
+FOR b IN businesses
+COLLECT c=b.category
+RETURN c
+Possible answer 6.2
+FOR b IN businesses
+RETURN DISTINCT b.category
+Possible answer 6.3
+LET categories = (
+ FOR b IN businesses
+ RETURN b.category
+)
+FOR c IN categories
+RETURN DISTINCT c
+Possible answer 6.4
+FOR c IN (
+ FOR b IN businesses
+ RETURN b.category
+)
+RETURN DISTINCT c
+
+
+### Expected for the emerge stage
+For the emerge phase, we expect 5 to 10 sketches that either combine different sketches, or take certain sketches a step further. Again: refer to the teaching material for inspiration on different ways to combine sketches.
+
+### Specific instructions
+As we did in the group session, please:
+
+* put your initials in the top-right corner of each sketch
+* number each sketch in the top-left corner
+* clearly indicate what each mark means
+* for the emerge sketches, indicate which sketch(es) from the diverge (or emerge) phase are combined
+
+## Implementation
+For the implementation part, we will use the energy dataset that we used for the exercises during the year. This is to make sure that you have access to the data. There are two designs that you need to implement. Specific instructions as well as the designs are available at [https://datavis-exercises.vercel.app/resit_project](https://datavis-exercises.vercel.app/resit_project).
+
+You have two choices to obtain these instructions:
+
+1. Update your existing website by following the [Receiving new instructions](https://datavis-exercises.vercel.app/instructions/working_on_exercises) section.
+2. Create a fresh website for this term:
+ 1. Create a new fork of the [exercise repository](https://gitlab.com/vda-lab/datavis_exercises).
+ 2. Create a Vercel deployment for your new fork.
+ 3. Send us your new Gitlab and Vercel urls per email!!!
+
+**Do not delay asking for help if you run in to issues at this stage!**
+
+## How to submit
+For the **designs**, we want you to submit a single zip-file which contains 2 folders: one called "diverge" with pictures of your diverge sketches, and one called "emerge" with pictures of your emerge sketches. We will create a Toledo/Blackboard assignment where you can upload them.
+
+For the **implementation**, we have created an additional folder in the git repository ("resit_project"), just like we did for the final visualisations in May. Remember that your visualisations have to show up on Vercel to get graded.
\ No newline at end of file