Using Science to Inform Educational Practices

Methods of Data Collection

Regardless of the method of research, data collection will be necessary. The method of data collection selected will primarily depend on the type of information the researcher needs for their study; however, other factors, such as time, resources, and even ethical considerations can influence the selection of a data collection method. All of these factors need to be considered when selecting a data collection method because each method has unique strengths and weaknesses. We will discuss the uses and assessment of the most common data collection methods: observation, surveys, archival data, and tests.


The observational method involves the watching and recording of a specific behavior of participants. In general, observational studies have the strength of allowing the researcher to see for themselves how people behave. However, observations may require more time and man-power than other data collection methods, often resulting in smaller samples of participants. Researchers may spend significant time waiting to observe a behavior, or the behavior may never occur during observation. It is important to remember that people tend to change their behavior when they know they are being watched (known as the Hawthorne effect).

Observations may be done in a naturalist setting to reduce the likelihood of the Hawthorne effect. During naturalistic observations, the participants are in their natural environment and are usually unaware that they are being observed. For example, observing students participating in their class would be a naturalist observation. The downside of a naturalistic setting is that the research doesn’t have control over the environment. Imagine that the researcher goes to the classroom to observe those students, and there is a substitute teacher. The change in instructor that day could impact student behavior and skew the data.

If controlling the environment is a concern, a laboratory setting may be a better choice. In the laboratory environment, the researcher can manage confounding factors or distractions that might impact the participants’ behavior. Of course, there are expenses associated with maintaining a laboratory setting, increasing the cost of the study, that would not be associated with naturalist observations. And, again, the Hawthorne effect may impact behavior.


Surveys are familiar to most people because they are so widely used. This method enhances accessibility to subjects because they can be conducted in person, over the phone, through the mail, or online, and are commonly used by researchers to gather information on many variables in a relatively short period of time.

Most surveys involve asking a standard set of questions to a group of participants. In a highly structured survey, subjects are forced to choose from a response set such as “strongly disagree, disagree, undecided, agree, strongly agree”; or “0, 1-5, 6-10, etc.”  One of the benefits of having forced-choice items is that each response is coded so that the results can be quickly entered and analyzed using statistical software. While this type of survey typically yields surface information on a wide variety of factors, they may not allow for an in-depth understanding of human behavior.

Of course, surveys can be designed in a number of ways. Some surveys ask open-ended questions, allowing each participant to devise their own response, allowing for a variety of answers. This variety may provide deeper insight into the subject than forced-choice questions, but makes comparing answers challenging. Imagine a survey question that asked participants to report how they are feeling today. If there were 100 participants, there could be 100 different answers, which is more challenging and takes more time to code and analyze.

Surveys are useful in examining stated values, attitudes, opinions, and reporting on practices. However, they are based on self-report, and this can limit accuracy. For a variety of reasons, people may not provide honest or complete answers. Participants may be concerned with projecting a particular image through their responses, they may be uncomfortable answering the questions, inaccurately assess their behavior, or they may lack awareness of the behavior being assessed. So, while surveys can provide a lot of information for many participants quickly and easily, the self-reporting may not be as accurate as other methods.

Content Analysis of Archival data

Content analysis involves looking at media such as old texts, pictures, commercials, lyrics, or other materials to explore patterns or themes in culture. An example of content analysis is the classic history of childhood by Aries (1962) called “Centuries of Childhood” or the analysis of television commercials for sexual or violent content or for ageism. Passages in text or television programs can be randomly selected for analysis as well. Again, one advantage of analyzing work such as this is that the researcher does not have to go through the time and expense of finding respondents, but the researcher cannot know how accurately the media reflects the actions and sentiments of the population.

Secondary content analysis, or archival research, involves analyzing information that has already been collected or examining documents or media to uncover attitudes, practices, or preferences. There are a number of data sets available to those who wish to conduct this type of research. The researcher conducting secondary analysis does not have to recruit subjects but does need to know the quality of the information collected in the original study. And unfortunately, the researcher is limited to the questions asked and data collected originally.


Many variables studied by psychologists—perhaps the majority—are not so straightforward or simple to measure. These kinds of variables are called constructs and include personality traits, emotional states, attitudes, and abilities. Psychological constructs cannot be observed directly. One reason is that they often represent tendencies to think, feel, or act in certain ways. For example, to say that a particular college student is highly extroverted does not necessarily mean that she is behaving in an extroverted way right now. Another reason psychological constructs cannot be observed directly is that they often involve internal processes, like thoughts or feelings. For these psychological constructs, we need another means for collecting data. Tests will serve this purpose.

A good test will aid researchers in assessing a particular psychological construct. What is a good test? Researchers want a test that is standardized, reliable, and valid. A standardized test is one that is administered, scored, and analyzed in the same way for each participant. This minimizes differences in test scores due to confounding factors, such as variability in the testing environment or scoring process, and assures that scores are comparable. Reliability refers to the consistency of a measure. Researchers consider three types of consistency: over time (test-retest reliability), across items (internal consistency), and across different researchers (interrater reliability). Validity is the extent to which the scores from a measure represent the variable they are intended to. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to.

There are various types of tests used in psychological research. Self-report measures are those in which participants report on their own thoughts, feelings, and actions, such as the Rosenberg Self-Esteem Scale or the Big Five Personality Test. Some tests measure performance, ability, aptitude, or skill, like the Stanford-Binet Intelligence Scale or the SATs.There are also tests that measure physiological states, including electrical activity or blood flow in the brain.

Video 2.5.1. Methods of Data Collection explains various means for gathering data for quantitative and qualitative research. A closed-captioned version of this video is available here.

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways. Unfortunately, being consistent in measurement does not necessarily mean that you have measured something correctly. To illustrate this concept, consider a kitchen scale that would be used to measure the weight of cereal that you eat in the morning. If the scale is not properly calibrated, it may consistently under- or overestimate the amount of cereal that’s being measured. While the scale is highly reliable in producing consistent results (e.g., the same amount of cereal poured onto the scale produces the same reading each time), those results are incorrect. This is where validity comes into play. Validity refers to the extent to which a given instrument or tool accurately measures what it’s supposed to measure. While any valid measure is by necessity reliable, the reverse is not necessarily true. Researchers strive to use instruments that are both highly reliable and valid.

Everyday Connection: How Valid Is the SAT?

Standardized tests like the SAT are supposed to measure an individual’s aptitude for a college education, but how reliable and valid are such tests? Research conducted by the College Board suggests that scores on the SAT have high predictive validity for first-year college students’ GPA (Kobrin, Patterson, Shaw, Mattern, & Barbuti, 2008). In this context, predictive validity refers to the test’s ability to effectively predict the GPA of college freshmen. Given that many institutions of higher education require the SAT for admission, this high degree of predictive validity might be comforting.

However, the emphasis placed on SAT scores in college admissions has generated some controversy on a number of fronts. For one, some researchers assert that the SAT is a biased test that places minority students at a disadvantage and unfairly reduces the likelihood of being admitted into a college (Santelices & Wilson, 2010). Additionally, some research has suggested that the predictive validity of the SAT is grossly exaggerated in how well it is able to predict the GPA of first-year college students. In fact, it has been suggested that the SAT’s predictive validity may be overestimated by as much as 150% (Rothstein, 2004). Many institutions of higher education are beginning to consider de-emphasizing the significance of SAT scores in making admission decisions (Rimer, 2008).

In 2014, College Board president David Coleman expressed his awareness of these problems, recognizing that college success is more accurately predicted by high school grades than by SAT scores. To address these concerns, he has called for significant changes to the SAT exam (Lewin, 2014).


Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Educational Psychology Copyright © 2020 by Nicole Arduini-Van Hoose is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book