Lecture 1: Data Visualization, Perception, and Gestalt Principles

Slides about Lecture 1, focusing on data visualization, human perception, and Gestalt principles. The Pdf, a detailed presentation for university-level computer science students, explores the definition of data visualization, visual perception, and key Gestalt principles like grouping and proximity.

37 páginas

Lecture 1

● WHAT IS DATA VISUALIZATION?

○ Data visualization is a display of data designed to enable analysis,

exploration, and discovery

○ It is the graphical display of abstract information for sense-making and

communication

● THE VISUAL SPECIES

○ We, as humans, are a visual and symbolic species. Our brains have the

ability to create and understand visual representations with different degrees

of abstraction

○ To see and to understand are deeply intertwined processes: we understand

because we see, we see because we understand. Seeing precedes

understanding, and this understanding precedes a better, deeper seeing.

● The purpose of infographics and data visualizations is to enlighten people, to inform

them, and not to entertain them, not to sell them products, services, or ideas.

● Infographics and data visualization may be used for:

○ Presentation of information, by means of statistical charts, maps and

diagrams (in particular, this can be achieved with infographics)

○ Exploration and analysis of data sets: users can find patterns and therefore

have a unique insight (in particular, this can be achieved with data

visualization)

● Storytelling allows to:

○ Highlight essential key points for the audience

○ Focus audience's attention

○ Provide a human touch to your data

○ Provide context

○ Improve audience engagement

● Data stories differ in important ways from traditional storytelling:

○ Stories in text and film typically present a set of events in a tightly controlled

progression.

○ Visualized data can also be interactive, inviting verification, new questions,

and alternative explanations.

● DATA VIZ AS A PROCESS

○ Data is the essential input for information visualisation

○ Visual encoding is the way in which data is transformed into visualisation

○ Perception is the process of looking at visual information and extracting

information from it

○ The user can interact with each element of the visualisation

● Data analytics and visualization makes sense of that data

● some examples of visualisation in history

○ 15000 B.C.

■ The Lascaux Cave paintings is one of the earliest depictions of a star

map, showing several constellation as the Pleiades and the Summer

Triangles

○ 10th Century

■ This is the oldest known example of graphical representation of

changing values, which shows the inclination of planetary orbits as a

function of time

○ 1644

■ The flemish astronomer Micheal Florent van Langrem produces what

believed to be the first visual representation of statistical data

○ 1786

■ William Playfair, scottish economist, was the first to use elements as

graduated and labeled axis, grid lines, titles, annotations, color to

highlight differences. This graph shows imports and exports between

England and other countries

○ 1924

■ Otto Neurath developed the Isotype (International System of

Typographic Picture Education) method to communicate statistical

information to the public in a pictorial way

Vista previa

WHAT IS DATA VISUALIZATION?

Data visualization is a display of data designed to enable analysis, exploration, and discovery
It is the graphical display of abstract information for sense-making and communication

THE VISUAL SPECIES

We, as humans, are a visual and symbolic species. Our brains have the ability to create and understand visual representations with different degrees of abstraction

To see and to understand are deeply intertwined processes: we understand because we see, we see because we understand. Seeing precedes understanding, and this understanding precedes a better, deeper seeing.

The purpose of infographics and data visualizations is to enlighten people, to inform them, and not to entertain them, not to sell them products, services, or ideas.

Infographics and Data Visualization Uses

Infographics and data visualization may be used for:
- Presentation of information, by means of statistical charts, maps and diagrams (in particular, this can be achieved with infographics)
- Exploration and analysis of data sets: users can find patterns and therefore have a unique insight (in particular, this can be achieved with data visualization)

Storytelling in Data

Storytelling allows to:
- Highlight essential key points for the audience
- Focus audience's attention
- Provide a human touch to your data
- Provide context
- Improve audience engagement

Data Stories vs. Traditional Storytelling

Data stories differ in important ways from traditional storytelling:
- Stories in text and film typically present a set of events in a tightly controlled progression.
- Visualized data can also be interactive, inviting verification, new questions, and alternative explanations.

DATA VIZ AS A PROCESS

Data is the essential input for information visualisation
Visual encoding is the way in which data is transformed into visualisation
Perception is the process of looking at visual information and extracting information from it

Data Analytics and Visualization Interaction

Data analytics and visualization makes sense of that data
- The user can interact with each element of the visualisation. some examples of visualisation in history

Historical Examples of Visualization

15000 B.C.
- The Lascaux Cave paintings is one of the earliest depictions of a star map, showing several constellation as the Pleiades and the Summer Triangles
10th Century
- This is the oldest known example of graphical representation of changing values, which shows the inclination of planetary orbits as a function of time
1644
- The flemish astronomer Micheal Florent van Langrem produces what believed to be the first visual representation of statistical data
1786
- William Playfair, scottish economist, was the first to use elements as graduated and labeled axis, grid lines, titles, annotations, color to highlight differences. This graph shows imports and exports between England and other countries
1924
- Otto Neurath developed the Isotype (International System of Typographic Picture Education) method to communicate statistical information to the public in a pictorial way

Lecture 2

Science is a stance, a way to look at the world, that everybody can embrace. Science is a set of methods, a body of knowledge, and the means to communicate it

Algorithm to Scientific Discovery

The algorithm to scientific discovery follows these steps:
- 1. You grow curious about a phenomenon, you explore it, and you formulate a plausible conjecture to describe it, explain it, or predict its behaviour
  - When making conjectures it is important that these make sense based based on the existing knowledge on how nature works. Usually, conjectures are made when you notice an interesting event and then a cause-effect correlation
  - For a conjecture to be good it must be testable, you should be able to weigh your conjecture against evidence
  - Being testable means being falsifiable, because a conjecture that can't be refuted is a bad conjecture.
  - Also, a good conjecture is made of several components, and these need to be hard to change without making the whole conjecture useless
- 2. You transform your conjecture into a formal and testable proposition, called a hypothesis
  - A conjecture that is formalized to be tested empirically is called a hypothesis. When producing hypothesis, we are studying variables
  - A variable is something whose values can change somehow
    - A predictor or explanatory variable, called an independent variable
    - An outcome or response variable, also known as the dependent variable

Skepticism in Information Gathering

When gathering information from any source, it's crucial to be a bit sceptical. Ask yourself whether the variables outlined in the study and the methods used for measurement and comparison accurately reflect the reality being analysed

Variable Types and Measurement Scales

Variables come in many different types, and they can be studied based on the scale by which they are measured
The difference in variables is important for data analysis and statistical interpretation, as they determine the proper methods and operations that should be used. a
The differences in variables will also influence the methods of representation selected for our visualizations.
Variables can be classified into discrete and continuous.
- A discrete variable is one that can only adopt certain values.
- A continuous variable is one that can adopt any value on the scale of measurement that you're using
Levels of measurement or data types reflect the accuracy with which a variable has been quantified, and it determines the methods that can be used to extract insight from the data
They belong to a hierarchy, each level builds on the one before:
- Nominal: the data is categorized without any inherent order or ranking
  - Nominal data is a type of qualitative data that groups variables into categories, it is also known as categorical data
  - Categories are labels, they are purely descriptive, they are without numeric or quantitative value
  - Mutually exclusive, the categories are mutually exclusive
  - Purely descriptive, the labels don't provide quantitative or numeric value
  - Non-hierarchical, no category is greater than or worth more than another
- Ordinal: the data is categorized with a clear order or ranking, but the intervals between them are not uniform
  - Ordinal data is a form of qualitative data that classifies variables into descriptive categories
  - It is characterized by the fact that the employed categories are ranked on a hierarchical scale
  - It falls under the category of non-numeric and categorical data, but it can still make use of numerical values as label
  - They are always ranked in a hierarchy
  - They allow you to calculate frequency distribution, mode, median and range variables
- Interval: the data is categorized with uniform intervals, but no true zero point
  - Interval data is a type of quantitative (numerical) data that groups variables into categories and uses ordered scale. The interval values are always ordered and separated using an equal measure of distance.
  - Ordered Scale, Interval data are measured using continuous intervals that show order, direction, and a consistent difference in values
  - Equal Measure of Distance, the difference between values is always evenly distributed and is known as interval
  - Lack of True Zero, it does not have a true zero, meaning that zero does not equal nothing. Interval, the term refers to the consistent difference between values in this type of data
  - Examples of interval data are:
    - Temperature
    - IQ scores
  - When determining whether a scale is interval or ordinal, a critical consideration is the presence of fixed measurement units with known and consistent intervals between any two points on the scale
- Ratio: the data is categorized with uniform intervals and a true zero point
  - Ratio data is a form of quantitative data that measures variables on a continuous scale, with an equal distance between adjacent values
  - Ordered scale with precision, The scale which is used to measure ratio data shows order, direction, and a precise difference in values
  - True zero, it has true zero, the zero represents absence of the variable (there are no negative values)
  - Arithmetic Operations, it can be added, subtracted, multiplied, and divided
3. You study and measure the phenomenon (under controlled conditions whenever it's possible), these measurements become data that you can use to test your hypothesis
- Once a hypothesis is posed, it's time to test it against reality, this is called a study which can be
  - Observational or cross-sectional, which considers data collected just at a particular point in time
  - Longitudinal, which considers data collected for an extended period
- Choose a random sample of the population you want to study rather than a hand-picked choice, this will yield more truthful results
- Randomisation is useful to deal with extraneous variables - those not the primary focus of a study but with the potential to impact outcomes or results
- These variables, which can introduce error and potentially confound relationships between studied variables, come in two forms:
  - Confounding variable, which can be identified and incorporated into our model
  - Lurking variable, that we don't include in our analysis for the simple reason that its existence is unknown to us. Whenever it is realistic and feasible to do so, researchers design controlled experiments, which help minimise the influence of confounding variables
- There are many types of experiments which share common characteristics
  - Experiments typically involve observing numerous subjects that are representative of the population under study
  - Subjects are divided into at least two groups-an experimental group and a control group
  - Subjects in the experimental group are exposed to a specific condition,
  - Those in the control group are subjected to a different condition or, in some cases, to no condition at all
  - Researchers measure the outcomes for subjects in both the experimental and control groups, comparing the results
4. You draw conclusions, based on the evidence you have obtained. Your data and tests may force you to reject your hypothesis, in which case you'll need go to back to the beginning, or your hypothesis may be tentatively corroborated
5. Eventually, after repeated tests and after your work has been reviewed by your peers, members of your knowledge or scientific community, you may be able to put together a systematic set of interrelated hypotheses to describe, explain, or predict phenomena called theory

Anatomy of a Graphic

A big part of the context, in data visualization, is the text
This component of a graph gives the reader visual clues that help the data tell a story and should allow the graph to stand alone, outside of any supporting narrative

SCALE in Data Visualization

SCALE (slide 31 for graph)
Scales define the association of data point values with graphical coordinates, they can typically be:
- Linear, used for many line graphs and scatter plots
- Logarithmic, used for line graphs and scatter plots where the range of values is very large
- Time, special case of linear scales where the units of measures is time, most frequently in the form of line graphs
- Ordinal, used when relative position of items is important but there is no mathematical basis to establish distance between units on the scale
- Categorical, known as nominal, used for graphs, column graphs, and pie charts

¿Non has encontrado lo que buscabas?

Explora otros temas en la Algor library o crea directamente tus materiales con la IA.