Validity is one of the basic criteria in psychodiagnostics of tests and techniques that determines their quality, close to the concept of reliability. It is used when you need to find out how well a technique measures exactly what it is aimed at; accordingly, the better the quality under study is displayed, the greater the validity of this technique.
The question of validity arises first in the process of developing the material, then after applying a test or technique, if it is necessary to find out whether the degree of expression of the identified personality characteristic corresponds to the method for measuring this property.
The concept of validity is expressed by the correlation of the results obtained as a result of applying a test or technique with other characteristics that are also studied, and it can also be argued comprehensively, using different techniques and criteria. Different types of validity are used: conceptual, constructive, criterion, content validity, with specific methods for establishing their degree of reliability. Sometimes the criterion of reliability is a mandatory requirement for checking psychodiagnostic methods if they are in doubt.
For psychological research to have real value, it must not only be valid, but also reliable at the same time. Reliability allows the experimenter to be confident that the value being studied is very close to the true value. And a valid criterion is important because it indicates that what is being studied is exactly what the experimenter intends. It is important to note that this criterion may imply reliability, but reliability cannot imply validity. Reliable values may not be valid, but valid ones must be reliable, this is the whole essence of successful research and testing.
Validity is in psychology
In psychology, the concept of validity refers to the experimenter’s confidence that he measured exactly what he wanted using a certain technique, and shows the degree of consistency between the results and the technique itself relative to the tasks set. A valid measurement is one that measures exactly what it was designed to measure. For example, a technique aimed at determining temperament should measure precisely temperament, and not something else.
Validity in experimental psychology is a very important aspect, it is an important indicator that ensures the reliability of the results, and sometimes the most problems arise with it. A perfect experiment must have impeccable validity, that is, it must demonstrate that the experimental effect is caused by modifications of the independent variable and must be completely consistent with reality. The results obtained can be generalized without restrictions. If we are talking about the degree of this criterion, then it is assumed that the results will correspond to the objectives.
Validity testing is carried out in three ways.
Content validity assessment is carried out to find out the level of correspondence between the methodology used and the reality in which the property under study is expressed in the methodology. There is also such a component as obvious, also called face validity, it characterizes the degree of compliance of the test with the expectations of those being assessed. In most methodologies, it is considered very important that the assessment participant sees an obvious connection between the content of the assessment procedure and the reality of the assessment object.
Construct validity assessment is performed to obtain the degree of validity that the test actually measures those constructs that are specified and scientifically valid.
There are two dimensions to construct validity. The first is called convergent validation, which checks the expected relationship of the results of a technique with characteristics from other techniques that measure the original properties. If several methods are needed to measure some characteristic, then a rational solution would be to conduct experiments with at least two methods, so that when comparing the results, finding a high positive correlation, one can claim a valid criterion.
Convergent validation determines the likelihood that a test score will vary with expectations. The second approach is called discriminant validation, which means that the technique should not measure any characteristics with which theoretically there should be no correlation.
Validity testing can also be criterion-based; it, guided by statistical methods, determines the degree of compliance of the results with predetermined external criteria. Such criteria can be: direct measures, methods independent of the results, or the value of social and organizational significant performance indicators. Criterion validity also includes predictive validity; it is used when there is a need to predict behavior. And if it turns out that this forecast is realized over time, then the technique is predictively valid.
Threats
Validity in psychology is a property of qualitative methodology, but factors may arise that distort a theoretically correctly constructed PDM. Side factors are more pronounced when working with poorly organized stimuli or new, previously unclear tasks for the subject.
The difficulty lies in studying unbalanced and insecure individuals. The main threats to high validity are the special characteristics of the test taker and situational phenomena.
The reliability of the results is reduced by:
- test subject's errors;
- specialist errors;
- errors caused by conditions or incorrect diagnostics.
If the diagnosis does not necessarily require a specialist to be in the room, then his presence may distort the results of the study. Comments and interpretation of test tasks also reduce the reliability of the data obtained.
A subject interested in intentional testing errors or presenting himself in a favorable light to management distorts the diagnostic results. No less dangerous is the psychophysiological state of the person being tested. For example, the individual is very hungry, tired, or suffers from a migraine.
Extraneous noise, voice, and the ability to discuss test tasks with other subjects reduce the accuracy of the results. This applies to errors in diagnostic conditions and procedures.
The validity of the test is
A test is a standardized task, as a result of its application, data is obtained about the psychophysiological state of a person and his personal properties, his knowledge, abilities and skills.
Validity and reliability of tests are two indicators that determine their quality.
The validity of the test determines the degree of correspondence of the quality, characteristic, or psychological property being studied to the test by which they are determined.
The validity of a test is an indicator of its effectiveness and applicability to the measurement of the required characteristic. The highest quality tests have 80% validity. When validating, it should be taken into account that the quality of the results will depend on the number of subjects and their characteristics. It turns out that one test can be either highly reliable or completely invalid.
There are several approaches to determining the validity of a test.
When measuring a complex psychological phenomenon that has a hierarchical structure and cannot be studied using just one test, construct validity is used. It determines the accuracy of the study of complex, structured psychological phenomena and personality traits measured through testing.
Criterion-based validity is a test criterion that determines the psychological phenomenon under study at the present moment and predicts the characteristics of this phenomenon in the future. To do this, the results obtained during testing are correlated with the degree of development of the quality being measured in practice, assessing specific abilities in a certain activity. If the validity of the test has a value of at least 0.2, then the use of such a test is justified.
Content validity is a criterion of a test that is used to determine the compliance of the scope of its measured psychological constructs and demonstrates the completeness of the set of measured indicators.
Predictive validity is a criterion by which one can predict the nature of the development of the quality under study in the future. This criterion for test quality is very valuable when viewed from a practical point of view, but there may be difficulties, since the uneven development of this quality in different people is excluded.
Test reliability is a test criterion that measures the level of consistency of test results across repeated studies. It is determined by secondary testing after a certain amount of time and calculating the correlation coefficient of the results obtained after the first and after the second testing. It is also important to take into account the peculiarities of the test procedure itself and the socio-psychological structure of the sample. The same test can have different reliability, depending on the gender, age, and social status of the subjects. Therefore, reliability can sometimes have inaccuracies and errors that arise from the research process itself, so ways are being sought to reduce the influence of certain factors on testing. It can be stated that the test is reliable if it is 0.8-0.9.
The validity and reliability of tests are very important because they define the test as a measuring instrument. When reliability and validity are unknown, the test is considered unsuitable for use.
There is also an ethical context in measuring reliability and validity. This is especially important when test results have implications for people's life-saving decisions. Some people are hired, others are eliminated, some students go to educational institutions, while others must finish their studies first, some are given a psychiatric diagnosis and treatment, while others are healthy - this all suggests that such decisions are made on the basis studying assessment of behavior or special abilities. For example, a person looking for a job must take a test, and his scores are the decisive indicators when applying for a job, and finds out that the test was not valid and reliable enough, he will be very disappointed.
[Edit] Reliability as stability
Stability of test results or test-retest reliability is the possibility of obtaining the same results from subjects in different cases.
Stability is determined using repeated testing (retest):
This method proposes to carry out several measurements with a certain period of time (from a week to a year) with the same test. If the correlation between the results of various measurements is high, then the test is quite reliable. The lowest satisfactory value for test-retest reliability is 0.5. However, the reliability of not all tests can be checked by this method, since the quality, phenomenon or effect being assessed may itself be unstable (for example, our mood, which can change from one measurement to the next).
Another disadvantage of repeated testing is the habituation effect. Test takers are already familiar with the test and may even remember most of their answers from the previous test.
In connection with the above, a study of the reliability of psychodiagnostic techniques is used using parallel forms, in which equivalent or parallel sets of tasks are constructed. In this case, the subjects perform a completely different test under similar conditions. However, there are difficulties in proving that the two forms are truly equivalent. Despite this, in practice parallel forms of tests have proven useful in establishing test reliability.
The validity of the methodology is
The validity of a technique determines the correspondence of what is studied by this technique to what exactly it is intended to study.
For example, if a psychological technique that is based on informed self-report is assigned to study a certain personality quality, a quality that cannot be truly assessed by the person himself, then such a technique will not be valid.
In most cases, the answers that the subject gives to questions about the presence or absence of development of this quality in him can express how the subject himself perceives himself, or how he would like to be in the eyes of other people.
Validity is also a basic requirement for psychological methods for studying psychological constructs. There are many different types of this criterion, and there is no single opinion yet on how to correctly name these types and it is not known which specific types the technique must comply with. If the technique turns out to be invalid externally or internally, it is not recommended to use it. There are two approaches to method validation.
The theoretical approach is revealed in showing how truly the methodology measures exactly the quality that the researcher came up with and is obliged to measure. This is proven through compilation with related indicators and those where connections could not exist. Therefore, to confirm a theoretically valid criterion, it is necessary to determine the degree of connections with a related technique, meaning a convergent criterion and the absence of such a connection with techniques that have a different theoretical basis (discriminant validity).
Assessing the validity of a technique can be quantitative or qualitative. The pragmatic approach evaluates the effectiveness and practical significance of the technique, and for its implementation an independent external criterion is used, as an indicator of the occurrence of this quality in everyday life. Such a criterion, for example, can be academic performance (for achievement methods, intelligence tests), subjective assessments (for personal methods), specific abilities, drawing, modeling (for special characteristics methods).
To prove the validity of external criteria, four types are distinguished: performance criteria - these are criteria such as the number of tasks completed, time spent on training; subjective criteria are obtained along with questionnaires, interviews or questionnaires; physiological – heart rate, blood pressure, physical symptoms; criteria of chance - are used when the goal is related or influenced by a certain case or circumstances.
When choosing a research methodology, it is of theoretical and practical importance to determine the scope of the characteristics being studied, as an important component of validity. The information contained in the name of the technique is almost always not sufficient to judge the scope of its application. This is just the name of the technique, but there is always a lot more hidden under it. A good example would be the proofreading technique. Here, the scope of properties being studied includes concentration, stability and psychomotor speed of processes. This technique provides an assessment of the severity of these qualities in a person, correlates well with values obtained from other methods and has good validity. At the same time, the values obtained as a result of the correction test are subject to a greater influence of other factors, regarding which the technique will be nonspecific. If you use a proof test to measure them, the validity will be low. It turns out that by determining the scope of application of the methodology, a valid criterion reflects the level of validity of the research results. With a small number of accompanying factors that influence the results, the reliability of the estimates obtained in the methodology will be higher. The reliability of the results is also determined using a set of measured properties, their importance in diagnosing complex activities, and the importance of displaying the methodology of the subject of measurement in the material. For example, to meet the requirements of validity and reliability, the methodology assigned for professional selection must analyze a large range of different indicators that are most important in achieving success in the profession.
[Edit] Cronbach's alpha
This method, proposed by Lee Cronbach, compares the variance of each item to the overall variance of the entire scale. If the spread of test scores is less than the spread of scores for each individual question, then each individual question is intended to probe the same common ground. They produce a meaning that can be considered true. If such a value cannot be developed, that is, a random scatter is obtained when answering questions, the test is not reliable and the Cronbach alpha coefficient will be equal to 0. If all questions measure the same attribute, then the test is reliable and the Cronbach alpha coefficient in this case will be equal to 1.
[Edit]See Also Discriminatory
Task discriminability is defined as the ability to separate subjects with a high overall test score from those who received a low score, or subjects with high educational productivity from subjects with low productivity.
In other words, discriminativeness is the ability of test items to differentiate students regarding the “maximum” or “minimum” test result. Determining the discriminativeness of a test task is necessary in order to put a barrier to low-quality tasks.
To calculate discriminativity, the method of extreme groups will be used: when calculating the discriminativity of a test task, the results of the most and least successful students are taken into account - this is the simplest and most visual method of calculating discriminativity.
The proportion of members of extreme groups can vary widely depending on the size of the sample. The larger the sample, the smaller the proportion of subjects you can limit yourself to when identifying groups with high and low results. The lower limit of the “group cutoff” is 10% of the total number of subjects in the sample, the upper limit is 33%. In this case, the 27% group will be used, since with this percentage the maximum accuracy in determining discriminativity is achieved. The discrimination index is calculated as the difference between the proportion of individuals who correctly solved the problem from the “highly productive” and “lowly productive” groups.
Psychometric paradox is a phenomenon that arises when using personality questionnaires; its essence lies in the fact that questions (statements) with a high discriminativeness index (see Discriminativity of test items) are unstable in relation to the repeatability of the result, and, conversely, the stability of the answer is often noted for those questions that have a low discriminativity.
P. Eisenberg (1941) showed that questions that allow one to distinguish patients with neurosis from other patients or healthy people are unreliable; in other words, there is little chance of getting the same answer when tested again. At the same time, with the help of questions defined as reliable, the differentiation of the studied groups was not achieved or was unsatisfactory. Later, the works of L. Goldberg (1963) and M. Novakovskaya (1975) were devoted to the study of this phenomenon, called P. p.
P. p. cannot be explained without a psychological analysis of the process of forming answers to questions on personality questionnaires. According to M. Novakovskaya, questions, while remaining formally unchanged, are subject to semantic (psychological) transformations both in terms of inter-individual and intra-individual. Interindividual variability is due to two reasons: differences in the severity of the measured trait (property) among different subjects and differences in understanding the meaning of the questions. Intraindividual variability is due to variability in meaning, difficulty in deciding a response, and fluctuation in trait expression (the latter source of variability may be ignored if the interval between repeated trials is short).
For the psychological interpretation of P. p. M. Novakovskaya suggests distinguishing three determinants of responses: the severity of the trait in the subject; the meaning given to the issue; degree of ease of deciding on an answer. She also emphasizes the need to distinguish unambiguous questions from ambiguous ones, which in a certain sense can be likened to projective stimuli.
M. Novakovskaya proposes to distinguish between two types of P. p. - type L and type B - and proceed from the following hypotheses of their occurrence. A type L paradox arises with questions that can be interpreted differently (multiple meanings), as well as when it is difficult to decide on an answer. Such questions have a high rate of discrimination with significant variability in the answer. A type B paradox arises with unambiguous questions for which it is easy to find an answer. This should also include the so-called. one-way diagnostic questions or those questions for which only one type of answer is diagnostically significant. Such questions are characterized by weak discriminativity and slightly expressed variability.
It is necessary to take P. into account when designing (adapting) personality questionnaires.
Examples of similar educational works
18.Characteristics of poorly formalized methods: observation, conversation, interview, analysis...
... according to the method denoting: numerical method graphic method adjective scale the graphic method complements the numerical method: draw ... A strictly defined interview tactic is defined, questions are asked in a strictly defined sequence. ...
12. The concept of validity, reliability, reliability in psychodiagnostics
... the test was recognized as valid. Thus, empirical methods to substantiate the validity of ... meaning. This value fluctuates within certain limits. Fluctuation of this value... the consistency of the test within itself, a measure of the adequacy of the selection of questions. ...
Validity criteria applied to qualitative research.
... the question of validity until recently seems to be one of the most difficult. The most established definition of this concept is the one given in the book by A. Anastasi: “The validity of the test ... has since been given less importance to humanitarian knowledge ...
Psychodiagnostic methods in psychology
... an option for differentiating methods: Organizational methods (this group includes the observation method and the experimental method) Auxiliary methods (this includes the method of expert assessments, various survey methods, the self-observation method, the test method, analysis ...
[Edit] Cronbach's calculation
Cronbach's is defined as
,
where is the number of items in the scale, is the variance of the total test score, and is the variance of the item.
An alternative way to calculate it is as follows:
where N is the number of items in the scale, is the average variance for the sample, and is the average of all covariances between sample components.
Currently, Cronbach's is calculated using SPSS, STATISTICA and other modern statistical packages, possibly using Microsoft Excel