Statistics is the collection, analysis, and interpretation of data. Therefore, statistical analysis refers to the process of collecting, analysis and interpretations the data to find the trends and patterns. Statistics is used in industry, research, and the government. For example, manufacturers use the knowledge of statistics to manufacture beautiful fabrics. Statistical analysis is the main component of data analysis. Statistical analysis is, therefore, is used to determine the patterns in semi-structured and unstructured data (Kothari, (2011).
Statistical analysis is a very important tool to get solutions when the experimental process is complex. For instance, research studies on turbulence rely on statistical analysis obtained from the experiments. The study of turbulence is very complex at a theoretical level. Therefore, the scientists use statistical analysis of turbulence to prove the theories. It is through surveys and experiments that social scientists are able to confirm their theories. The paper discusses different methodologies of statistical analysis that are used in the social sciences.
Statistical methods used in social sciences
Empirical research studies in the social sciences need both reliable and accurate measures. Collection of data in social sciences may take many forms including; measurement of opinions, cognitions, and perceptions. Good data analysis requires a proper collection of the data. The biggest mistake that most researchers in social science do is the collection of data at a lower level that is necessary.
To understand and evaluate a research study, it is important for one to have a clear understanding of the different statistical methods. Most research studies in social sciences use quantitative statistical methods such as analysis of multiple regression and variance. The other quantitative statistical methods include; correlation and regression method, inferential statistics, descriptive statistics, and the different measurement scales.
The hierarchical analysis is also known as the multilevel model. Multilevel models are statistical models of parameters that differ at more than one level. The models also extend to non-linear models. The tool is mostly used for scientific social studies where the data for the participants are organized at more than one level. The units of the research studies are usually at a lower level. The lowest level of data here is usually an individual. Multilevel models give an alternative type of analysis for multivariate or univariate analysis of repeated measurements. In the events where the scores of the variables are adjusted for covariates, multilevel models are used as an alternative to ANCOVA. The tool can also be used on data with several levels.
Structure equation is used for estimating and testing causal relationships using a combination of qualitative causal assumptions and statistical data. Structural equation models are used for both theory development and theory testing. The strengths of the tool include; firstly, it can be used for explorations. Secondly, it is used to estimate the values of the free parameters. Lastly, it has the ability to construct latent variables. The major weakness of the tool is that; it involves extensive planning before it is used.
Chi-square is used for both nonparametric and parametric statistics. Most computations that involve chi-square use nominal data. For this reason, it is mostly used in situations that involve nonparametric statistics. The main principle behind chi-square is the comparison between the expected frequency and the observed frequency. The aim of using chi-square is to determine the level of significant difference between the samples (Bernstein & Bernstein, 2009).
Some of the strengths of the chi-square include; it is easy to compute than most statistical methods, it can be used on a nominal scale data. Lastly, chi-square does not make any assumption on the distribution of the population. Limitations of chi-square include; the groups that are measured must be independent, it is not appropriate, and the data must be frequency data (Sheskin & David 2007).
Path analysis is used to illustrate the directed dependencies among the variables. Path analysis involves models that are equivalent to the various forms of multiple analysis, discriminant analysis, canonical correlation analysis, and factor analysis. It also involves the general family models in covariance analyses and multivariate analysis. Path analysis can be perceived as a unique case of a structural equation modeling
Path analysis refers to a statistical model that examines the strengths of indirect and direct relationships among the variables. The major goal of social research is to understand social systems through the explications of causal relationships. The complexity of social life, disentangling the relationships of the variables is always difficult. Path analysis tool assists the researchers who use the correlational data to disentangle different types of causal processes of a particular outcome.
Descriptive statistics is usually used to explain the characteristics of the data. Descriptive statistics gives simple summaries about the measures and the samples. It is used to describe the samples of the research study. Descriptive statistics are usually based on a normal curve, and they are considered as measurements of the central tendency. The normal curve is a curve that is evenly distributed on both sides of the mode and the mean. The distribution is centered on establishing standard deviation below and above the mean. For instance, the mean of the research study may be 50, and the standard deviation maybe 10. The standard deviation above the mean may be 60 while the standard deviation below the mean maybe 20 (Conway, 2003).
Descriptive statistics are used by researchers to summarize the collected data systematically. It helps the researchers to present their data in a manner that is easy to understand. The method also presents the data in a manner that conceptualizes the general characteristics of the responses of the samples. Descriptive statistics are normally presented first in the sections of the results. In the section of the result, descriptive statistics is always followed by inferential statistics. Statistical software aids in the easy calculation of descriptive statics. The four main types of descriptive measure are; correlations, graphs, tables, variability, and central tendency (Trochim, 2006).
Central tendency refers to a central value of a probability distribution. It is normally called the center of the distribution. The most common central tendency include; the mean, the meridian, and the mode.
Mean is the most common type of central tendency. Researchers use the mean to determine the average of the scores in a frequency. The advantages of mean include; it uses every value in the sample and, therefore, it represents data in a good manner. Secondly, it resists the changes between the different samples. Lastly, it is similar to the standard deviation; standard deviation is the most common measure of dispersion. The disadvantages of the mean include; firstly, it is sensitive to the extreme outlier, especially when the size of the sample is small. Hence, it is not appropriate for the measurement of a skewed distribution. Lastly, the mean cannot be used for ordinal and nominal data (Babbie, 2009).
Mode is used to determine the most frequent value in the distribution. The advantages of mode include; mode can be calculated easily. Lastly, it can be used in nominal scale data. The disadvantage of using the mode is that it is not defined algebraically, and the fluctuation in the observation frequency is more when the size of the sample is small (Cox, 2006).
Median is used to determine the midpoint in the distribution. Advantages of using the median include; it is not distorted by skewed data, it is easy to comprehend. Lastly, it can be used in ordinal, interval, and ratio scale data. The disadvantages include; it is not amenable for further mathematical operations, it does not give the precise value of the observations (Nick, 2007).
Standard deviation is obtained by calculating variance. The square root of the variance gives a standard score in the distribution. The standard score can be converted to a score that can easily be interpreted on a normal curve. The advantages of standard deviation include; it uses all data to determine the spread of the distribution, it is used in hypothesis analysis. Lastly, it gives the weightage to the negative and positive deviation from the mean. The disadvantage of using standard deviation is that it is difficult to compute and comprehend (Feller, 2010).
Researchers also use graphs and tables to explain the sample. Graphs and tables give a summary of the descriptive results. In addition, graphs and tables are used to show frequencies of the response of the participants, standard deviations, means, and percentages. Graphs such as bar graphs are used within ordinal and nominal data to describe the results. A table helps one to see the relationship between the output value and the input value. The disadvantage of a table is that it is difficult to interpret the data. Graphs help researchers to determine the functions of the variables easily. On the other hand, graphs are very inaccurate, tiring, tedious, and cumbersome if one has to draw them freehand (Evans et al., 2004).
Measure of relationships
Researchers also use measures of relationships to present results from the sample. The relationships of two variables are usually defined in linear terms. Correlation may range from a perfect positive correlation, -1.0 to a perfect negative correlation. High correlation means that the variable in the distribution is close to one another. Low correlation means that the variables are widely spread; the variables are far from one another. Zero correlation means that the variables in the distribution are not related to one another. Strong correlation also means that the variables of interest vary together. Most researchers prefer to use Pearson, r to determine the correlation in the distribution. Pearson is very sensitive in determining significant correlations (Feller, 2010).
The four main assumptions of inferential statistics are homogeneity, independence, linearity, and normality of variance. Normality refers to the degree to which scores simply approach the normal curve. The Chi-square method uses the assumption of normality. Linearity shows the degree to how two variables in a distribution correlate in a linear manner (Freedman, 2005).
The variable that relates in a circular or a curved fashion violates this assumption. Independence determines whether the variables in the distribution are independent. The assumption holds that the variables should not influence one another. Homogeneity of variance refers to the degree to which the variances are homogeneous. With the satisfaction of these requirements, a researcher may use a statistical method that is very sensitive to differences (Asadoorian & Kantarelis, 2005).
For one to interpret inferential statistics, he or she must understand the level of significance. The level of significance is the likelihood of rejecting or accepting a null hypothesis. The null hypothesis shows that there is no difference between the samples that are being compared. For instance, one may compare the impacts of a stress training activity with a reported level of stress (Moore & McCabe, 2003).
The null hypothesis does state that there is no difference between the stress program and the reported level of stress. Inferential statistics is always used to reach conclusions of whether the results that are obtained from distribution can be used on the targeted group. The most common types of inferential statistics include; analysis of variance, t-test, analysis of covariance, multivariate analysis of variance, and multiple regression (Tabachnick & Fidell, 2007).
The t-test is mostly used to compare the dependent measures of two groups. For a researcher to use a t-test, the data should either be ratio or interval scale scores. T-test calculations are concerned with the differences of the two groups that are compared. The T-test is easy to perform for two samples, but difficult for more than two samples. The use of t-square requires a proper understanding of statistics. It cannot be used by the non-mathematicians (Feller, 2010).
Analysis of variance
Analysis of variance is also used to compare the groups. The variable that is dependent must be on a ratio or interval scale. Like t-test, analysis of variance is used to compare the samples. Analysis of variance is better than the t-test because it can make simultaneous comparisons between two or more means. Analysis of variance generalizes t-test to more than two groups. The main advantage of the analysis of variance is that it opens up doors for several testing capabilities. One disadvantage of analysis of variance is that it does not solve the mathematical headache (Cox, 2006).
Multivariate analysis of variance is similar to the analysis of variance. It involves the comparison of the variables but includes the comparisons of more than one dependent variable. Multivariate analysis models reality where a decision involves more than a single variable. For instance, the decision to purchase a house may consider the safety, size, and safety (Johnson, & Wichern, 2007). Multivariate analysis helps in determining relationships between the variables in an overarching manner.
Multivariate analysis also controls the relationships between the variables by using partial correlation, cross-tabulation, and multiple regressions to determine the links between the dependent and the independent variables. The main disadvantage of the multivariate is that it is complex and involves a high level of mathematics that needs a statistical program for data analysis. Most of the statistical programs are very expensive. The results of the analysis are difficult to interpret. The results are based on assumptions that are difficult to assess. Lastly, the technique requires the use of a large amount of data; otherwise, the analysis is meaningless (Anderson, 2010).
Measurement of scales
A nominal scale is used in situations where a researcher has given differences in measurements and observations to distinct categories. A nominal scale simply places perception, event, and people into groups that are based on the common trait. A nominal scale involves very few mathematical operations. Examples of nominal data include; status of gender and ethnicity. Data such as female versus male, Americans versus Africans, blondes versus brunettes are typically suited to the nominal scales. Therefore, nominal scale data involves counts. Researchers usually assign values to nominal scales in order to perform a limited number of mathematical operations (Snedecor & Cochran, 2009)
For instance, they usually assign values to different ethnic groups. Nominal scale data are mostly used in the analysis of variance. The main advantages of nominal scales are; firstly, they are simple; they do not need any mathematical operations. Secondly, they are easy to interpret even those who are poor in mathematics can interpret the data. Lastly, nominal scales data are sometimes presented in pictorial form and hence, they are attractive. The main disadvantage of the nominal scale is that it is the lowest form of measurement and, therefore, it does capture some information. It has no information about the focal object. It has no information of whether an individual is a smoker or a non-smoker or whether the individual attended college or not (Medhi, 2012).
Researchers use ordinal scales to assign values to data based on order or ranks. Ordinal scales are also the least scale of measurement. An example of ordinal data is the ranks of the employees at the workplace. Ranks are not precise and are not differentiated by equal units. For example, a manager is not as twice as good as a supervisor. Therefore, ranks are not equidistant between one another.
The main advantages of the ordinal scale over the nominal scale are; firstly, it has all information that is captured in the nominal scale. Secondly, it arranges data starting from the lowest data to the highest data. Thirdly, instead of categorizing data by putting an object, it gives the details of the data. Its main disadvantages are; firstly it is also the lowest form of measurement. Lastly, it also suffers from loss of information in the data (Von, 2005).
Interval scales involve the use of numbers with equal units. Even though numbers are used, the scale has no true zero points. For instance, most educational and psychological tests are centered on an interval scale. The main advantages of internal scales are; firstly, it shows the distance of one object from another. Lastly, it does not suffer from information loss of data. The main disadvantage of the interval scale is that it involves mathematical operations. Non- mathematicians may have problems in the interpretation of the data.
Ratio scales are similar to interval scales; they are defined in terms of equal units of numbers. Unlike the interval scale, the ratio scale has a true zero scale. Weight is the best example of a ratio scale because the measurements begin at zero. Another example of zero scales is the temperature. The ratio scale is the most precise of all the measurements here. Under the ratio scale, a large number of mathematical operations can be performed.
The main advantages of the ratio scale are, firstly, it has the richest information. Lastly, it has an absolute zero point. The greatest disadvantage of the ratio scale is that it involves thorough computation of the data and, therefore, the non-mathematicians are not able to interpret data as it is required. A research study can be termed as quantitative when the variables have been used in one of the four scales of measurement.
Each and every statistical method has advantages and disadvantages. Anyone carrying out a research study on social science should be able to determine the type of statistical method that is suitable for the research study. For example, the large amount of data would require the use of multivariate analysis. Different research studies require the use of different statistical methods. In addition, it is also not advisable to choose the method that requires thorough mathematical computations, especially if they are not good in mathematics.
Anderson, T (2010). An Introduction to Multivariate Statistical Analysis, Wiley, New York, Johnson, R & Wichern, D. (2007). Applied Multivariate Statistical Analysis (Sixth ed.). Prentice-Hall
Asadoorian, M. O., & Kantarelis, D. (2005). Essentials of inferential statistics. Lanham: University Press of America.
Babbie, R (2009). “The Practice of Social Research (12th ed.).” Wadsworth. pp. 436–440.
Bernstein, S., & Bernstein, R. (2009). Schaum’s outline of theory and problems of elements of statistics II: Inferential statistics. New York: McGraw-Hill.
Cohen, J. (2008). “Statistical power analysis for the behavior sciences (2nd ed.).” Routledge
Conway, F. (2003). Descriptive statistics. Leicester: Univ. Press.
Cox, R. (2006). Principles of statistical inference. Cambridge New York: Cambridge University Press.
Evans et al., Michael (2004). “Probability and Statistics: The Science of Uncertainty.” Freeman and Company. p. 267
Feller, W. (2010). “Introduction to Probability Theory and its Applications.” Vol I. Wiley. p. 221
Freedman, D. (2005). “Statistical Models: Theory and Practice.” Cambridge University Press. Moore, D. & McCabe, P. (2003). “Introduction to the Practice of Statistics (4e).” W H Freeman & Co.
Kothari, C. R. (2011). Research Methodology: Methods and Techniques. New Delhi: New Age International Ltd.
Medhi, J. (2012). Statistical methods: An introductory text. New York: Wiley.
Nick, G. (2007). Descriptive Statistics. New York: Springer
Sheskin, David J. (2007). Handbook of Parametric and Nonparametric Statistical Procedures (Fourth ed.). Boca Raton (FL): Chapman & Hall/CRC. p. 3
Snedecor, W & Cochran, G. (2009). Statistical methods. Ames: Iowa State University Press.
Stevens, S. (2006). “On the Theory of Scales of Measurement”. Science 103 (2684): 677–680.
Tabachnick, G. & Fidell, S. (2007). Using Multivariate Statistics (5th ed.). Boston: Pearson International Edition
Trochim, W. (2006). “Descriptive statistics”. Research Methods Knowledge Base. Retrieved 14 March 2011.
Von, A. (2005). “Review of Cliff and Keats.” Ordinal measurement in the behavioral sciences. Applied Psychological Measurement, 29, 401–403