Categorical Data

Data that can be grouped by specific, no numerical categories

Quantitative data

Data that use numeric values to indicate how much or how many

There are limitations to ____ data

Categorical

Descriptive Statistics

Summaries if data in a form that is easier for the reader to understand

Including tabular, graphical and numerical pr3esentations of data

Population

the set of all elements of interest in a particular study

Sample

A subset of the population

Statistical Inference

Use of data from a sample to make estimates and predictions about the characteristics of a population

Frequency Distribution

tubular summary of data showing the number of observations in each of several non-overlapping categories

Relative Frequency

The fraction or percentage of observations belonging to a class

N in the formula means

The number of observations

k is the number of

non-overlapping classes

Rule for the first k

2^k>_n

Steps for credating a frequency distribution for quantitive data

Determine number of nonoverlapping classes called k

Determine the width of each class

Determine the class limit

histogram

a bar graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis.

Class frequencies are represented by theheight of the bar with the bars being drawn adjacent to each other

One of the most important uses of a histogram is to

provide information about the shape or form of a distribution

Skewness

Tendency of a histogram to be "off center"

Skewness to the right

The histograms tail extends farther to the right

Skewness to the left

The histograms tail extends father to the left

Symmetric

The histogram is neither skewed left nor right

Scatter diagram

A graphical display of the relationship between two quantitative variables

Treadline=

a line that provides an approximation of the relationship between two variables

Three types of relationships depicted by scatter diagrams

Positive relationships

Negative relationships

No apparent relationships

Positive relationships

The scatter diagram seems to suggest an "upward" pattern. Positive relationship indicates that the variables move in the SAME DIRECTION

Negative Relationship

The scatter diagram seems to suggest a "downward" pattern. Negative relationships indicates that the two variables move in opposite directions

No apparent relationship

the scatter diagram seems to suggest a "random" pattern

Mean

The average value for a variable. It provides a measure of the central location for the data

Two different means:

population mean

Sample Mean

Population mean

the average value for all the observations from the population

Population mean sign is the

Greek looking U thing

Sample Mean

the average value for all the observation's from the sample

Sample mean is shown as

x with bar above it -

Parameter

Characteristic of a population

P's go together

statistic

A characteristic of a sample

Remember the s's go together.

Population Mean

u=E x's/ N

E (fake weird e) is the summation operator

N = the total number of observations in the population

Sample mean

X _ above = E x's / N

Where E = the summation Operator\

n= total number of observations in the sample

Median

another measure of central location. it is the value of in the middle when the data is arranged in ascending order (smallest to largest value)

How to find the median of a data set

1. Arrange the data in ascending order

2. Determine odd or even

1. for odd number the median is the middle number

for Even number the median is the average of the two middle numbers

Mode

Another measure of central location. It is the value that occurs with the greatest Frequency

Range=

largest value-smallest value

variance

a measure of variability that utilizes all the data

Standard deviation

the positive square root of the variance

z-score

often called the standardized value it is the number of standard deviations a data value (x) is from the sample mean x_ (above)

Chebyshev's theorem

enables us to make statements about the percentage of data values that must be within a specified number of standard deviations of the mean

Formula for Chebs Theorem

At least (1- 1/c^2) of the data values must be within c standard deviations of the mean, where c is any value greater than 1