-
Notifications
You must be signed in to change notification settings - Fork 1
Home
- The goal of the workshop is to produce a 3D visualization of a toy dataset of Twitter status updates.
- The main way I will test your understanding of the material covered so far is to ask you to tweak the code examples I give in some way to change the output.
- If you're stuck - don't worry! The workshop is designed to give you a taste of what's possible with Processing and to encourage you to follow up if you find it interesting.
- Download the code examples for this workshop from https://github.com/eamonnbell/dcip-workshop/archive/master.zip
- Unzip this file to somewhere handy and navigate to the
ex0subfolder. - Double-click on
ex0.pdeto open the Processing sketch.
A Processing sketch is not just a single file. It's:
- a main sketch file: SKETCH_NAME.pde
- other .pde files defining custom classes
- other .java files and data files
- ...all inside a folder that has to be called SKETCH_NAME
You can think of Processing as a subset of Java. So if you know Java, you know most Processing syntax. If you don't, no worries!
β Before you run the
ex0sketch, what do you think it does?
β Run the
ex0sketch with the menu (or Ctrl-R). What does it do?
println() is a function. We know that 'cause it has parentheses in the mix. It takes stuff and does things with it. In this case it sends the stuff you tell it to the "console", which is the lower part of the Processing environment. Stuff is a big category:
- strings - delimited by double quotations
" - numbers - we only have to worry about integers (whole numbers) in this workshop
π Modify the code so that it sends the text
Hello world!to the console ten times in a row. Be as clever or as straightforward as you like.
We're done with the ex0 sketch. Close it out and open up the ex1 sketch by finding the right folder and opening up the sketch code.
Any more, I'm just going to assume that when we move on to Exercise n, you close down everything and open up the sketch exn.
The purpose of this exercise is to introduce you to certain fundamental programming concepts and how they are used in Processing.
Functions like println() do stuff. The stuff it does is already pre-determined by Processing.
But you can define your own functions that do stuff. Mostly by calling on other functions, but it's still useful. The syntax for function definition involves brackets - the curly one: {. It looks something like this:
void my_function() {
// stuff my_function does
};The keyword void indicates that the function does not provide a result to its caller.
Most Processing sketches need two core functions whose actions you define. Otherwise they don't run.
setup()draw()
We look at each in turn.
The setup() function is called once, when the sketch is run.
β Look inside the definition of the
setup()function inex1. What do you think it does?
Oh, yeah. Semicolons. Lovingly misused in English by millions daily. In Processing (and some other languages) they're like a full stop. They say: here's the end of an instruction for you, expect another.
π Make the drawing window really tall and skinny. Now make it really short and fat. What does this tell you about order of stuff that
size()expects you to give it?
π Give
size()stuff that makes it break. What does Processing tell you has gone wrong? How useful is that information?
It makes sense that size() goes in the setup() function definition, since we normally want the size of the window to stay the same. The background() function paints the whole window one color. That color is (255, 255, 255) a.k.a. white.
For this workshop, colors seriously don't matter. If you know what RGBA is, great. If you know HTML colors then you can use them in Processing either, any time a function works with color-stuff. When in doubt, choose Papaya Whip (#FFEFD5). Actually, always use Papaya Whip.
The draw() function is called on every frame render, pretty much as frequently as is possible.
OK, so Processing is like an animator's flip book. Any drawing that happens in the draw() function gets done over and over again, like on every page turn.
That way, if we fiddle with positions of thing on the fly, we'd get the illusion of motion. But if we don't, it looks like things are static. Which is fine, but it's important to remember that all those red squares are really being redrawn up to 60 times a second.
Another metaphor. Processing is also a bit like a kid who can only handle one color (or thickness, kind, etc.) of crayon at a time. So, for instance, to get Processing to build up a complex image with many components, imagine instructing this kid:
- pick up a green pen
- draw a rectangle
- pick up a red pen (implicitly, kid puts down green pen, right?)
- draw a bunch more rectangles
β Now look at the code in the
draw()function. If colors are specified by a sequence of three numbers which tell you the balance of red, green, and blue in the mix respectively, what function name corresponds roughly to the 'pick up a ____ pen' effect described above?
π Mess with the stuff that the first
rect()function operates on (the stuff between parentheses) so that you make the green square larger.
The "stuff" that a function operates off of are called function arguments or parameters.
rect(top_left_x, top_left_y, botttom_right_x, bottom_right_y) draws a rectangle:
- whose top left corner is top_left_x pixels over from the top left of the window
- whose top left corner is __________ pixels down from the top left of the window
- whose bottom right corner is __________ pixels over from the top left of the window
- whose bottom right corner is bottom_right_y pixels down from the top left of the window
Processing has a whole slew of optional functions with special names whose contents you get to define. Processing understands the special name of the function and executes its contents under certain pre-defined circumstances.
β Under what circumstances do you think instructions defined in a function called
keyPressed()code is run? How aboutmouseDragged()?
Because they "handle" what happens when a certain event (user, system, etc.) happens these special functions are sometimes called event handlers.
π Run
ex1and click your mouse in the draw window and drag it around.
What's going on inside this function?
First we declare three variables by telling Processing what kind of data (stuff) they hold. In all three cases, the variable is a whole number (integer) so the type declaration is int.
Just as there are special event handler functions with reserved names there are a couple of special variables that are reserved so that Processing expose certain details of the sketch to you in the code without having to write any functions to collect that data. When you type any of these reserved variable names, the Processing code editor will conveniently highlight them in pink.
β What are the special variables used in
ex1? What information do you think each of them contains?
ellipse() draws an ellipse centered around the co-ordinates specified in the first two arguments. An ellipse has a semi-major and a semi-minor radius. These are specified in the next two arguments to the function.
A circle is a special case of an ellipse where the semi-major and semi-minor radius are the same.
π Modify the code so that the
ellipse()function so that instead of drawing blue circles when the mouse is dragged, it draws an oval shape of the color of your choice.
π Did you notice the "extra" fourth argument to the
fill()function? Fiddle with it and find out what it does.
This is the "Hello, world!" of 3D drawing. Making a box appear on screen. Notice that we have added an additional argument to the size() function in the void() code block, which tells Processing how we want it to deal with drawing in 3D.
β Skim the code for
ex2. Which line contains the function that draws the box?
π Meddle with the arguments for the function that draws the box. What does each parameter do?
Unlike rect(), 3D drawing functions like box() tend not take positions as arguments. In general, their arguments specify the geometry of the object being drawn: its dimensions and shape.
As a consequence, we can think of drawing 3D objects to the screen as a two step process:
- tell Processing where we want the object to appear
- tell Processing what object of what size/dimension
The model of the kid with one pen at a time still holds, so we can modify the above list to look something like this:
- tell Processing what kind of color, texture, etc.
- tell Processing where we want the object to appear
- tell Processing what object you want of what size/dimension
Step 2. is achieved by applying transformations to the scene.
Translation is a linear motion with respect to each of the three dimensional planes.
Just as in 2D, the point corresponding to the origin with (0, 0, 0) is in the top left corner of the drawing screen. So to get our cube in to the center of the frame, we translate() it
- over
width / 2pixels from the right - down
height / 2pixels from the top - +/- 0 pixels towards to the viewer
π Change the value in the third argument to the
translate()function inex2. If the third argument is larger, then the box appears ________.
π Add another box to the scene
Unsurprisingly,
-
rotateX()rotates everything around the X-axis -
rotateY()rotates everything around the Y-axis -
rotateZ()rotates everything around the Z-axis
Arguments to rotate_() functions are specified in radians. There are 2Ο radians in a circle.
π Modify the arguments of at least two of the rotate functions so that the cube changes its orientation in space in response to the location of the mouse cursor. Hint: Processing provides special variables with reserved names that are highlighted pink in the code editor, which hold information concerning the sketch, including mouse position.
π Bonus Here is the syntax for a function provided by Processing called
map(value, start1, stop1, start2, stop2). What do you think it does? Use it to modifyex2so that one traversal of the entire screen by the mouse on one axis corresponds to exactly one 360Β° rotation of the cube in some direction.
From the Processing documentation:
Transformations are cumulative and apply to everything that happens after and subsequent calls to the function accumulates the effect. For example, calling translate(50, 0) and then translate(20, 0) is the same as translate(70, 0). Likewise, calling rotate(PI/2.0) once and then calling rotate(PI/2.0) a second time is the same as a single rotate(PI).
(From https://processing.org/reference/translate_.html, https://processing.org/reference/rotate_.html)
All transformations are reset when draw() begins again.
This is a really important thing to note.
One advantage of 3D visualizations is that they allow the user to inspect representations of data from a variety of different angles. Normally, we consider the user as the operator of a camera which looks in on the scene.
By making the rotation of the cube respond to mouse input we can emulate the movement of a camera. In order to do that we have to take into consideration the relative motion of all the components in the scene, which is tiresom. Processing proves a camera() function, but even that is a little bit cumbersome: we would have to write event handlers for mouse and keyboard input.
Fortunately, we can take advantage of a contributed library called PeasyCam, which will take care of all this on our behalf. By adding a couple of lines of code to our sketch, we have a sensible camera model which is controlled by mouse input.
We don't have time to describe every aspect of the newly-added code. However, it's important to understand the camera model that PeasyCam uses and the mouse bindings used to control the camera.
From the PeasyCam documentation:
a mouse left-drag will rotate the camera around the subject, a right drag will zoom in and out, and a middle-drag will pan. A double-click restores the camera to its original position. The PeasyCam is positioned on a sphere whose radius is the given distance from the look-at point. Rotations are around axes that pass through the looked-at point.
The default looked-at point has implications for how we arrange our scene.
β Run the sketch and move the camera around using the mouse. Notice that the camera is not directed at the cube we want to view. What is the location of the cube in 3D space? What is the location of the looked-at focal point of PeasyCam?
π Change the arguments to the translate function so that the cube is now back in the center of the camera's notional field of view.
(We can also add arguments to the PeasyCam constructor to change the looked-at point.)
Exercise 3 is an example of the use of translate() within a for-loop to build up an object that is a composite of a number of simpler 3D drawing functions - in this case: box().
The goal of this exercise is to introduce the functions pushMatrix() and popMatrix().
In this scene we have a cube of cubes. Imagine how we would put this together. First we'd build out a line of n cubes in the direction of the x-axis. Then we'd like that line to be replicated n times along the plane orthogonal to the x plane. That gives us a sheet of cubes, which we'd finally like to be replicated n times in along the plane orthogonal to that new sheet. This gives us a total of nΒ³ cubes. Cute.
To achieve the cube of cubes, we could just compute the expected co-ordinates of each cube and apply a translation for each cube. But we want to 1. reduce the number of transformation op's to the minimum; and 2. model the generative process described above accurately.
To do so, we introduce the idea of a matrix stack.
The
pushMatrix()function saves the current coordinate system to the [matrix] stack andpopMatrix()restores the prior coordinate system.pushMatrix()andpopMatrix()are used in conjunction with the other transformation functions and may be embedded to control the scope of the transformations.
Imagine doing a translation operation, translate(10, 0, 0). This moves the "pen" in 3D space +10 pixels in the x-axis away from the true origin. Then I call pushMatrix(). It's like resetting the origin for all subsequent translations. So when I call translate(0, 10, 0), and draw something, the true or absolute position is (10, 10, 0). So we can say that the translation operation after the call to pushMatrix() was reckoned with respect to the "new" temporary origin (0, 10, 0).
Every pushMatrix() call must have a corresponding popMatrix() call after you've done all your operations in the new local co-ordinate system. Nesting subsequent calls to pushMatrix() and popMatrix() allows you to "scope" transformations, just like with variables. Basically, it means that transformations have a local effect.
Another way to think about stacking transformations is to notice that they can be used to group 3D objects in a scene, where groups are implicitly defined by the local scopes of transformations. That way, we can give objects in a group a common fate which is coherent with their identity as a collection. In turn, groups of groups (nested calls to pushMatrix()) can be used to move the groups wholesale throughout the space.
The goal of this exercise is to introduce some features of Processing that allow for object-oriented programming. We'll do that by trying to achieve the same effect as Exercise 4 by different means.
Before we dive into the code, we have to introduce a new class unique to Processing that is exceptionally useful: PVector. It's so commonly used, we could even consider it a data type fundamental to Processing, like String or ArrayList.
A
PVectordescribes a two or three dimensional vector, specifically a Euclidean (also known as geometric) vector. A vector is an entity that has both magnitude and direction. The datatype, however, stores the components of the vector (x, y for 2D, and x, y, z for 3D).
β What kinds of physical properties/quantities are suitable for representation in a
PVector?
The next thing we'll do introduce the syntax for creating classes in Processing. A class is a cookie-cutter for a thing of a certain kind. It provides the blueprints for complex objects which have (and I'm not using these terms in any technical way):
- properties
- behaviors
So a car has a bunch of properties: stick-shift or automatic; two-wheel or all-wheel drive; maximum speed; number of seats... We use variables to keep track of these kinds of properties in a class which models a car. When variables are used in this way, inside a class, they are sometimes called fields.
It also has a load of behaviors. We define functions inside the car class to perform these behaviors. Functions defined inside classes are sometimes called methods. There is a special syntax for calling class methods. If we have an instance of a Car class called my_car, and Car models the driving behavior with a method called drive, we use the "dot syntax" to call that method as in:
// Declaration and constructor call
Car my_car = new Car();
my_car.drive();A little bit like when started using Processing and we had to defined some special functions (setup() and draw()), to code a fully functioning class in Processing (Java), we are obliged to define a special method that had has the same name as the class. This special function is called the constructor. It is like the "boot-up" sequence for the class anytime it gets instantiated.
Look at the class definition for a class called Box (all classes get names that start with capital letters) and notice where the fields, methods, and constructor are defined.
class Box {
// FIELDS
PVector location;
PVector size;
// CONSTRUCTOR
Box(PVector location_) {
location = location_;
size = new PVector(15, 15, 15);
}
// METHODS (just one)
void display() {
pushMatrix();
translate(location.x, location.y, location.z);
box(size.x, size.y, size.z);
popMatrix();
}
}Now, we need to look at what happens in the code for ex4a now that we have this class defined. First things, notice the variable declarations at the top of the file. There's a new syntax we haven't seen before:
ArrayList<Box> boxes = new ArrayList<Box>();This is a more complicated type declaration which says:
- there's going to be an
ArrayList(basically a list)... - which is only allowed have instances of class
Boxinside - and it's called
boxes - and its initial value is an empty list which can only handle
Boxobjects
Phew. Don't worry about the details. Just remember that in this sketch there's a list of boxes (called boxes) that every function can "see" and update using a method called .add().
π Write a new class called
Blobwhich displays a sphere on the screen atPVector locationwith radiusint radius. You are going to have to change:
- add a variable declaration for the field
radius - the constructor so that that it initializes the
radiusfield with an integer
If you have done any object-oriented programming before, you know that normally we would introduce the notion of class inheritance at this point. Instead, I want to introduce you to the concept of an interface in Java. If a class is a kind of blueprint which specifies the expected fields and methods of an object, then an interface is a kind of blueprint for a class that specifies the names of the methods that classes (which implement that interface) are expected to provide.
Interfaces allow us to stipulate that we expect a certain group of classes to have some specific methods. It doesn't specify how the method should be implemented, or what it should do. Using an interface only acts as a promise that a class will implement those methods if called (collectively described as that class's interface).
We have two classes now, Box and Blob.
π Change the code inside the nested
for-loops so that a newBlobis added to theboxesArrayList. What happens? What does the error message mean?
We want to modify the ArrayList in such a way that it supports adding classes that represent objects that we want to .display() on the screen, so that we can add Box or Blob or any other custom class we care to define after the fact.
To do this, we first define the interface which promises that all classes which implement it will have a (static) display() method. The syntax for interface definition is similar to the syntax for class definition.
interface DisplayPrimitive {
void display();
}Now, we modify the class definitions of Box and Blob to reflect the fact that they implement this particular interface, using the implements keyword.
class Box implements DisplayPrimitive {
// et cetera
}
class Blob implements DisplayPrimitive {
// et cetera
}Then, we change our ArrayList declaration so that it expects to contain a list of DisplayPrimitive rather than any particular class (i.e. Box or Blob).
ArrayList<DisplayPrimitive> display_primitives = new ArrayList<DisplayPrimitive>();
void setup() {
// et cetera
display_primitives.add(b);
}Finally, we have to change our type declarations in the fancy for-loop so that it refers to the interface, rather than any particular class.
void draw() {
// et cetera
for (DisplayPrimitive dp : display_primitives) {
dp.display();
}
}π Test the flexibility of the
DisplayPrimitiveinterface by creating your own custom class whichimplements DisplayPrimitiveand replace the constructor onBoxorBlobin the nestedfor-loop with your own object.
β In the context of a data visualization, what is the usefulness of the basic abstraction provided by the
DisplayPrimitive?
Exercise 4b is just the completed DisplayPrimitive implementation of the drawing system.
We need to start getting data into Processing. We will be reading data from a .csv (comma-separated values) file. We store data files in data subdirectory in the sketch folder.
π Take a look at the demo data here.
Fortunately, Processing provides built-in objects and methods for reading tabular data from the disk.
This example, though short, introduces some important concepts in Java/Processing.
β If
int number_of_balloons;tells Processing that the variablenumber_of_balloonsis of typeint(integer = whole number), what doesTable t;do?
Notice that this object declaration is outside any of the core function definitions. This is because we need it to be "visible" to both the setup() function, and later, the draw() function (even though there's nothing there yet).
loadTable() is a convenience function that returns a Table object. The first argument to loadTable() says where the file is located; the second argument to loadTable() is optional. If it's included, and set to "header", then the table loader knows that the first row of the .csv file being loaded contains the column header names.
β Examine the snippet of code below. What functions have you seen before? What do they do? Can you guess what this snippet does?
The colon can be read as "in". So:
for (each) row in the rows of the Table called t:
- print to the console the contents of the "amount" field (understood as an int)
for (TableRow row : t.rows()) {
println(row.getInt("amount"));
} This exercise is a combination of the drawing system developed in Exercise 4b and the data ingestion code from Exercise 5. Let's focus on the guts of the exercise that are most new.
// ...
t = loadTable("data/data.csv", "header");
for (TableRow row : t.rows()) {
PVector random_location = PVector.random3D();
random_location.mult(100);
Box b = new Box(random_location);
display_primitives.add(b);
}
// ... β What do you think the method
PVector.random3D()returns? We know that it has to return aPVector(what's that?) because of the type declarationPVector. We're multiplying it by a scalar... What is the scalar and why? To find out, change the scalar factor to 1 and observe the effect.
β Based on your reasoning about all the elements included in the sketch so far, what does this entire sketch do?
The whole point about data visualization is make the visual elements of the display correspond to properties of the data. In this exercise we aim to change the size of the visual element corresponding to each row of the data table in accordance with the amount field. If the amount is large: draw a large box. To accommodate this, we need to update the Box class so that it doesn't use a hard-coded size. We can write multiple constructors for different situations, so we've done that.
Box(PVector location_, PVector size_) {
location = location_;
size = size_;
}Now, when a Box is constructed with one argument, it will default to a size of 15 in each dimension. When a Box is constructed with two arguments, it will expect both a location and size PVector.
Then, we need to modify the code that runs in the fancy for-loop as it iterates through the rows of the table. We ask the table object for an integer representation of the value in the amount column for each row...
int amount = row.getInt("amount");
// ... initialize a PVector with that amount in each component of the vector
PVector size = new PVector(amount, amount, amount);
// ... initialize a new Box using the new two-argument constructor
Box b = new Box(random_location, size);
// ... and add that Box to the list of display_primitives as before
display_primitives.add(b);In this example, we just use the value of the amount column directly to set the size of the displayed element. But what if that value is very large? Or very small? The scale of the data matters; in almost all instances you will have to rescale the input data when it is being used to modify a display element parameter. Fortunately, Processing provides a map() function which takes a number of arguments to linearly rescale a variable from one range to another.
For example, if integer some_data to be represented by the size of a display element ranges from 25 to 5000, we might want to have that reflected by the size of an object, whose minimum size is 10 pixels (or whatever) and maximum size is 50 pixels. We would write code like the following:
int some_data;
float size = map(some_data, 25, 5000, 10, 50);You might have realized that we have to know in advance the max and min values that we expect some_data to have. Because of time, we can't go into how we can get (or estimate) that information in advance, suffice to say that more complicated data structures or custom might be useful for storing and scaling data in this way.
So now we have a size for the object that represents the data in some way, how about the position of the object? Position in space is useful for representing ordinal relationships like chronology, relative strength/correlation etc. In the present workshop we won't develop any spatial layout methods (though see my repo here for some examples).
In Exercise 7, we assign the cubes to a random location on the sphere of radius 100 which has its center on the origin. To achieve this we use a PVector class method called random3D() which returns a three-dimensional PVector that is situated somewhere on the unit sphere. Then we call the method mult(100) on this PVector, which performs (in-place) scalar multiplication on that vector.
PVector random_location = PVector.random3D();
random_location.mult(100);The final exercise is a visualization of some simple Twitter data. While there's a lot more code, most of the functions and designs used will be familiar.
The data/ directory contains a number of .csv files with the data to be visualized.
The users.csv contains:
- the user's handle
- the user's reach (number of followers)
- the user's party affiliation
For each user (US presidential nominees on Twitter) we have one file containing 5 of their most recent tweets, one per row:
- the tweet text
- the tweet's reach (number of retweets)
β Study
ex8.pde. There are a number of variables that are declared in such a way that they are visible to every function in the sketch. What are the names of these 5 variables and what information do they store?
β How many
for-loops are used in this exercise? Are any of them nested inside each other? For eachfor-loop describe its behavior verbally and place that in the larger context of the goals of the visualization.
π Modify the code so that Twitter users are represented in the scene by
Boxobjects instead ofBlobobjects.
π Modify the code which adds the spheres representing tweets to the scene. The size of the sphere should correspond to the number of retweets that each tweet received. This data is contained in the
data/directory, in theretweet_countfield for each user's list of tweets.