Method of correlation analysis: an example. Correlation analysis is ...

Education

In research often occursthe need to find a connection between the resultant and factor variables (the yield of a crop and the amount of precipitation, height and weight of a person in homogeneous groups by sex and age, pulse rate and body temperature, etc.).

The second are the signs contributing to the change of those associated with them (the first).

The concept of correlation analysis

There are many definitions of the term. Based on the foregoing, it can be said that correlation analysis is a method used to test the hypothesis of the statistical significance of two or more variables, if the researcher can measure them, but not change them.

There are other definitions under consideration.concepts Correlation analysis is a method of processing statistical data, which consists in studying the correlation coefficients between variables. At the same time, correlation coefficients are compared between one pair or multiple pairs of features, in order to establish statistical relationships between them. Correlation analysis is a method for studying the statistical relationship between random variables with the optional presence of a strict functional nature, in which the dynamics of one random variable leads to the dynamics of the mathematical expectation of another.

Concept of falsity correlation

When conducting correlation analysis is necessarytake into account that it can be carried out in relation to any combination of features, often absurd in relation to each other. Sometimes they have no causal connection with each other.

In this case, they say a false correlation.

Tasks of the correlation analysis

Based on the above definitions, you canformulate the following tasks of the described method: obtain information about one of the desired variables using the other; to determine the closeness of the relationship between the studied variables.

Correlation analysis involves determining the relationship between the studied characteristics, and therefore the task of correlation analysis can be supplemented with the following:

  • identifying the factors that have the greatest impact on the resultant mark;
  • identification of previously unknown causes of relationships;
  • building a correlation model with its parametric analysis;
  • study of the significance of communication parameters and their interval estimation.

Relation of correlation analysis with regression

The method of correlation and regression analysis
The method of correlation analysis is often notlimited by finding the narrowness of the relationship between the quantities under study. Sometimes it is complemented by the compilation of regression equations, which are obtained using the analysis of the same name, and representing a description of the correlation dependence between the resultant and the factor (factor) symptom (s). This method in conjunction with the analysis under consideration constitutes a method of correlation and regression analysis.

Terms of use method

Effective factors depend on one toseveral factors. The method of correlation analysis can be applied if there are a large number of observations about the magnitude of the effective and factorial indicators (factors), while the factors under study should be quantitative and reflected in specific sources. The first can be determined by a normal law — in this case, the result of the correlation analysis is the Pearson correlation coefficients, or, in case the signs do not obey this law, the Spearman's rank correlation coefficient is used.

Correlation analysis is

Rules for the selection of factors of correlation analysis

When applying this method is necessarydetermine the factors that influence the performance indicators. They are selected taking into account the fact that there must be a causal relationship between the indicators. In the case of creating a multifactor correlation model, those that have a significant impact on the resulting indicator are selected, while interdependent factors with a pair correlation coefficient of more than 0.85 are preferable not to be included in the correlation model, as well as those for which the link with the effective parameter is non-linear or functional character.

Results display

The results of the correlation analysis can be presented in text and graphic forms. In the first case, they are presented as a correlation coefficient, in the second - as a scatter diagram.

The results of the correlation analysis

In the absence of correlation between the parameters of the pointthe diagram is randomly located, the average degree of communication is characterized by a greater degree of orderliness and is characterized by a more or less uniform distance from the median. A strong bond tends to a straight line and, with r = 1, the point plot is a flat line. The inverse correlation differs in the directionality of the graph from the upper left to the lower right, the straight line from the lower left to the upper right corner.

Three-dimensional representation of the scatter diagram

In addition to the traditional 2D representation of the scatter diagram, a 3D display of a graphical representation of correlation analysis is currently used.

Correlation Factors

Also uses a scatterplot matrix,which displays all the paired graphics in one picture in a matrix format. For n variables, the matrix contains n rows and n columns. The diagram located at the intersection of the i-th row and the j-th column is a graph of Xi variables compared to Xj. Thus, each row and column is one dimension, a single cell displays a scatter diagram of two dimensions.

Correlation analysis solution

Evaluation of closeness

The tightness of the correlation is determined bycorrelation coefficient (r): strong - r = ± 0.7 to ± 1, medium - r = ± 0.3 to ± 0.699, weak - r = 0 to ± 0.299. This classification is not strict. The figure shows a slightly different scheme.

Correlation Analysis Method

An example of the application of the method of correlation analysis

In the UK, a curious study was undertaken. It is dedicated to linking smoking to lung cancer, and was carried out by correlation analysis. This observation is presented below.

The source data for the correlation analysis

Professional group

smoking

mortality

Farmers, foresters and fishermen

77

84

Miners and quarry workers

137

116

Manufacturers of gas, coke and chemicals

117

123

Glass and Ceramics Manufacturers

94

128

Workers for furnaces, forging, foundry and rolling mills

116

155

Electrical and Electronics Workers

102

101

Engineering and related professions

111

118

Woodworking production

93

113

Tanners

88

104

Textile workers

102

88

Workwear Manufacturers

91

104

Employees of the food, drinking and tobacco industry

104

129

Paper and printing manufacturers

107

86

Manufacturers of other products

112

96

Builders

113

144

Artists and Decorators

110

139

Drivers of stationary engines, cranes, etc.

125

113

Workers not included elsewhere

133

146

Transport and communications workers

115

128

Warehouse workers, storekeepers, packers and workers filling machines

105

115

Office workers

87

79

Sellers

91

85

Sports and recreation workers

100

120

Administrators and Managers

76

60

Professionals, technicians and artists

66

51

We begin the correlation analysis. The solution is better to start for clarity with a graphical method, for which we construct a scatter diagram (scatter).

Correlation Analysis Example

She demonstrates a direct link. However, on the basis of only a graphical method, it is difficult to make an unambiguous conclusion. Therefore, we continue to perform correlation analysis. An example of the calculation of the correlation coefficient is presented below.

Using software tools (for example, MSExcel will be described later) we determine the correlation coefficient, which is 0.716, which means a strong relationship between the parameters under study. Let us determine the statistical validity of the obtained value from the corresponding table, for which we need to subtract 25 values ​​from 25 pairs, resulting in 23 and by this line in the table we find r critical for p = 0.01 (since this is medical data, more strict dependence, in other cases it is sufficient p = 0.05), which is 0.51 for this correlation analysis. The example showed that r calculated is greater than r critical, the value of the correlation coefficient is considered statistically significant.

Use of software during the correlation analysis

Described type of statistical data processingcan be carried out using software, in particular, MS Excel. Correlation analysis in Excel involves calculating the following parameters using functions:

1. The correlation coefficient is determined using the CORREL [CORREL] function (array1; array2). An array1,2 is a cell of an interval of values ​​of result and factor variables.

The linear correlation coefficient is also called the Pearson correlation coefficient, and therefore, starting with Excel 2007, you can use the PEARSON function with the same arrays.

Graphic display of correlation analysis in Excel is performed using the "Charts" panel with the choice of "scatter chart".

After specifying the source data we get the graph.

2. Estimation of the significance of the pair correlation coefficient using Student's t-test. Calculated t-value compared with tabular (critical) valueof this indicator from the corresponding table of values ​​of the parameter in question, taking into account a given level of significance and the number of degrees of freedom. This assessment is carried out using the function TINV (probability; degrees_freedom).

3. Matrix of pair correlation coefficients. The analysis is performed using the “Data Analysis” tool, in which the “Correlation” is selected. Statistical evaluation of pair correlation coefficients is carried out when comparing its absolute value with a tabular (critical) value. If the calculated pair correlation coefficient exceeds the critical one, we can say, given the given degree of probability, that the null hypothesis about the significance of a linear relationship is not rejected.

Finally

Use in scientific research methodcorrelation analysis allows to determine the relationship between various factors and performance indicators. It should be borne in mind that a high correlation coefficient can also be obtained from an absurd pair or a variety of data, and therefore this type of analysis should be carried out on a sufficiently large data array.

After obtaining the calculated value of r itIt is desirable to compare with r critical to confirm the statistical reliability of a certain value. Correlation analysis can be carried out manually using formulas, or using software tools, in particular MS Excel. Here you can build a scatter diagram (dispersion) in order to visualize the relationship between the studied factors of correlation analysis and the resultant mark.