This tells researchers if the effect actually exists and wasn't the result of chance. Once enough independent researchers find an effect it could be tentatively accepted. But it says nothing about the scope of the effect. Does it hold in all cultures? Are there gender- or age-differences? It could also be that other researchers think the original study had a methodological flaw, and that's why an effect was (or wasn't) found. These researchers may want to change the method a bit, to create a (in their eyes) more valid way to measure the studied concept. When researchers change things in the method but try to study the same concept in (slightly) different circumstances the replication attempt is dubbed a conceptual replication.
As you can hopefully see, replications are a vital part of scientific research. Any one study could produce a "false positive" (or "false negative", but I'll get to that later) through any number of ways without any bad intentions by the researcher, so no one study can provide sufficient proof for anything (something to keep in mind when you come across articles about scientific studies in non-scientific media as well). The problem is, some researchers in psychology find that it's hard to get their replication attempts published, even in journals where the original study was published! As one needs publications to maintain and build an academic career, you can probably imagine how many researchers in psychology attempt replications. Furthermore, studies that don't find the effect they're looking for hardly get published at all. Imagine how many researchers are probably performing replications without even knowing it because no published material is available on the subject!
So far I've talked about variables that influence findings without any fault by the researcher. But there are a number of things that researchers can do to influence results with or without intent. Research in psychology has found time and time again (harharhar) that ones thoughts, ideas and opinions influence ones perceptions, like is the case with confirmation bias. So even without intent one could lead subjects to give desired responses, exclude subjects who produced unfavourable data etc. On the former, researchers attempt to standardize interactions with subjects to reduce such effects, or double blind procedures are employed. But when it comes to the data, there is still a lot of standardization to be achieved.
During one of my practical lectures we received an assignment were we were asked to look through data and discuss which subjects we would exclude. Now excluding subjects is always ambiguous business. The data were collected online, which means there is a lot less control over the environment the subject is in. So subjects are given a colour-blindness test and some general questions about the subject of the study (what do they think it is) and the amount of noise and distraction in their environment. In the end, the teachers said that instead of excluding subjects solely on the basis of for instance their distraction score, we should always look at anomalies in that subjects data and only exclude them if they have any (and report it if the study is published of course). This seems to make sense, if their reactions times are all over the place, or they answered a lot of the easy things wrong, anything really out of the norm. Another thing we should do is run the analysis with and without the excluded subjects to see if the results hold. If they do, we can report the results without the excluded subjects.
This was were some doubt formed in my mind. I may have misunderstood or missed something, but it seems to me if the results don't change there really isn't any point in excluding the subjects at all, is there? After reading the required literature for this week, my conviction that this is wrong has strengthened. If we as researchers can't be sure that we're not unconsciously guided by confirmation bias, we shouldn't look at the data before we exclude people. So rules should be set up beforehand for excluding subjects (this is something the teachers mentioned too). In my mind, this means saying for instance that subjects with a distraction score of 3 or higher should be excluded, period. No looking at their data. If a researcher is afraid to "lose" too many subjects this way, one could make conditional rules, like: Subjects with a distraction score of 3 or higher will be excluded unless this leaves less than X subjects. If it leaves less than X subjects, subjects with a distraction score of 4 or higher will be excluded instead.
Also, testing with and without excluded subjects is interesting, but increases the chance for statistical error and leaves the door open for researchers to only report the most favourable results and simply stating that the other analysis was also significant.
Thankfully, quite some researchers are worried about the state of affairs and have proposed numerous ways to remedy the problems. I think data collection and analysis should be as heavily controlled for as is done for extraneous variables in experiments. What, and how it will actually be implemented remains to be seen, but if researchers in the field of psychology cannot use their extensive (although utterly incomplete) knowledge of the human condition to better research practice, it doesn't bode well for science as a whole.
Following is a list of the required readings for my class that partly inspired this post.
Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant Psychological Science, 22, 1359-1366.
Nosek, B., Spies, J.R., & Motyl, B. (2012). Scientific Utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7, 615-631.
Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13, 90-100.
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J. & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 632-638.
Brandt, M.J. (2014). The Replication Recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217-224.
Ioannides, J.A. (2012). Why Science Is Not Necessarily Self-Correcting. Perspectives on Psychological Science, 7, 645-654.