Slide da Unisr.it su introduzione alla statistica: variabili e distribuzioni di frequenza. Il Pdf, utile per l'università in matematica, offre una panoramica chiara sui concetti fondamentali della statistica, inclusi esempi ed esercizi con soluzioni per l'autovalutazione.
Mostra di più46 pagine


Visualizza gratis il Pdf completo
Registrati per accedere all’intero documento e trasformarlo con l’AI.
P. Rancoita rancoita. paolamaria@unisr.it 1 / 46
3 / 46
Population parameters ‹ Statistical inference generalization of the results Sampling Descriptive statistics Sample estimates of the parameters
4 / 46
Example. We select 100 lung cancer patients from the national cancer registry (CR), in order to estimate the mean number of cigarettes smoked per day by a lung cancer patient, before the diagnosis of the disease.
target population = all lung cancer patients parameter = mean number of cigarettes smoked per day before the diagnosis sample = 100 lung cancer patients from the CR
5/ 46
6 / 46
Statistics is necessary (or must be accounted for) in every phase of a study:
About 50% of the literature is thought to have some lack from a statistical point of view (Ercan et al, Eur. J. Gen. Med. 2007).
7 / 46
A correct design of the study allows:
8 / 46
Example. When studying a disease for which the age is a prognostic factor, the two groups are not comparable if they have not the same age distribution (e.g. one group presents a higher number of young subject than the other one).
9 / 46
Example. In Emotional category data on images from the International Affective Picture System (Mikels et al, 2005), the authors wanted to identify which images were able to elicit a particular emotion more than others. They used samples of students with mean age 18-19 years, thus their findings are not generalizable to older individuals.
10 / 46
A precise data collection is the base for a good research.
11 / 46
Example. The presence of only a poor measured or reported data may completely alter the result of the analysis (on the right-hand side).
120 8 100 80 60 O 40 O O 20 0 0 20 40 60 80 100 120 O 120 8 100 80 60 O 40 20 0 0 20 40 60 80 100 120
12 / 46
Example. Several statistical methods assume that the observations are all independent. In some studies, measurements are taken before and after the treatment in order to assess its efficacy. But data referring to the same subj. are dependent, thus appropriate methods need to be applied for the analysis.
13 / 46
Example. We want to represent the weights of a group of patients together with their mean.
Wrong solution: A graph like the ones below may give a misleading interpretation of the data, especially if the weights show a particular trend with respect to the order of the patients.
80 Mean 70 Weight (Kg) 60 50 - - 2 4 6 8 10 Patient 80 Mean 70 Weight (Kg) 60 50 - - 2 4 6 8 10 Patient
14 / 46
Correct solution: A graph like the one below gives better the idea of the weights that are mostly represented in the sample.
3.0 2.5 Mean 2.0 - Frequency 1.5 1.0 L 0.5 1 0.0 L 40 50 60 70 80 90 Weight (Kg)
15 / 46
For a correct interpretation of the results, it is necessary to account for:
When a statistical analysis shows a significant association between two variables, the interpretation of this association as causality is beyond the meaning of the standard statistical analysis and can be supported only by clinical/biological knowledge of the phenomenon.
16 / 46
Example (possible misinterpretation of association results). Analyzing the data about coronary heart disease (CHD), it can be usually found that there is an association between heavy coffee drinking and CHD mortality. Nevertheless, the real risk factor for CHD is heavy smoking (which is also associated with CHD mortality).
Heavy coffee drinking is associated with CHD mortality (although it is not the cause), because often heavy smokers are also heavy coffee drinkers.
Cigarette Smoking (Confounding Factor) Coffee Drinking I I I I CHD Mortality
17 / 46
Population parameters ‹ Statistical inference generalization of the results Sampling Descriptive statistics Sample estimates of the parameters
18 / 46
population = collection of subjects or objects of interest (target) that share common observable characteristics unit = any individual or element of the target population ⇓ we select sample = subset of the (target) population which is representative of the entire population
19 / 46
Definition. A variable is any kind of observable characteristic that can vary among the units of a population.
Example. Examples of variables are: sex and age.
Definition. A parameter of the population is a numerical characteristics related to a variable of the (target) population.
Example. Examples of parameters related to the previous variables are: the percentage of females and the mean age.
Definition. A data is the observed value of a variable for one particular unit of the sample.
Example. In a study, the reported values of sex and age of the patients are data.
20 / 46
21 / 46
Definition. A variable is called categorical (or qualitative) if its values denote the membership to a category/group, that is its values represent a particular quality of the units of the population.
The possible categories of a variable must be mutual exclusive, that is a unit cannot belong to more than one category.
Example. The variable sex is categorical, since its values are: male and female.
22 / 46
Definition. A variable is called numerical (or quantitative) if its values represent quantities that can be measured or counted.
Example. The variable age is numerical.
Remark. A numerical variable can be transformed into a categorical one by dividing the interval of all its possible values in subintervals, which then define the categories of the new variable.
Example. The age can be divided in three classes: < 30 , between 30 and 60 (both inclusive), > 60. The resulting variable is categorical (and, in particular, ordinal).
23 / 46
Remark. In a database or in a case report form (CRF), often the categories of the categorical values are labeled with numbers. Therefore, it is necessary to understand the "real meaning" of the labels (numbers), in order to define the type of variable.
Example. The values of the level of satisfaction can be denoted as: 1(=low), 2(=medium) and 3(=high).
Although the values of the variable are labeled with numbers, they represent three categories (low, medium, high) and thus the variable is categorical (and not numerical).
24 / 46
25 / 46
Definition. A categorical variable is called ordinal if its values (or categories) have an intrinsic (and not simply "aesthetic") order. Otherwise, the variable is called nominal. If a nominal variable assume only two values is called dichotomic.
Example (1). The level of satisfaction (which assumes the values: low, medium, high) is an ordinal variable. The categories of the variable can be ordered in the following way: low - medium - high.
Example (2). The presence of fever (which assumes the values: no/yes) is a nominal (dichotomic) variable. In fact, it is not possible to order the values no and yes.
1 26 / 46