TEST ARSENAL

1. INTRODUCTION

This paper presents all the psychometric tests developed and that is distributed by Integrity International (INTEG) and described them in a generalized and structured way – providing a bird’s eye view of the most critical aspects thereof.

Although INTEG specializes in Integrity in the World of Work in the South African market, it also developed other products that are making a further contribution to assessing Integrity from different angles, and are dedicated to those specific areas, e.g., cognition, personality, etc.

The following aspects are covered:

I. A short description of the test – including the number of test-items it consists of, the structure of the instrument, the time required to complete it, the measuring areas (scales) it provides and the purpose served by the test.

II.   Psychometric Properties of the instrument – including reliability, validity, bias, fairness & readability.

                  III.  General use and decision-making rules advised by the developer.

IV.  An example of the Summarized Report generated by the test.

V.    Technical Manual.

VI.  Norms.

 

Considering the technical detail, uniqueness, comprehensiveness and volume, as well as the commonality in some facets, it is considered necessary to combine the description of the last two subjects, i.e. Technical Manual & Norms and to present it first as well as a general introducing coverage of subject ‘II’ being ‘Psychometric Properties’.


 2. TEST DEVELOPMENT MODEL

Before presenting the above specifies on the individual tests, it is considered necessary to provide a brief outline of the seven steps the traditional well established model of test development consists of that was used in developing the INTEG-tests.

2.1 Conceptualizing : What are we looking for?

2.2 Operationalizing : How would this show itself?

2.3 Quantifying : How can we attach a value to what we have observed?

2.4 Pilot testing : How does the test behave in practice?

2.5 Item analysis : Does each item contribute properly to the total score?

2.6 Norm development and interpretation : What does this score mean? (Develop and maintain norms).

2.7 Evaluation of test : Is the assessment process consistent and accurate? (Is it reliable and valid?).

A detailed description of each step for each test, does not fit in the format of this paper, but because INTEG specializes in Integrity, and the IP200 is the flagship in this field, a more detailed application of these steps in the development process of this measuring instrument is provided in Addendum A as an example of the composition of the INTEG-tests in more practical terms and how the steps were applied in the development of all the others tests. In the light of the fact that even the summarized information reflected in this addendum covers 40 odd pages, a condensed version thereof is provided here, under the heading “Main Moments in Developing the IP200”.


 “MAIN MOMENTS IN DEVELOPING THE IP200

1.The IP200 was originally developed to service as a Counseling Instrument.

2.The concept of Integrity was at first studied at length, thoroughly, in-depth and in much detail – conceptualizing phase by implication.

3.The results flowing from this initial phase were used to differentiate between behavior, attributes, attitudes, etc. that presents themselves as below to higher in terms of this description/concept – establishing a criteria by implication (of Integrity).

4.The ‘criteria’ so established, was used to produce a ‘normal’ distribution of employed people (irrespective of their jobs, experience, qualification, industry type involved in, race, gender, nationality, etc.) along the continuum of integrity.

5.Out of a total of 21, 192 independent anecdotes/narrative that were produced by the research team, 1212 were submitted to statistical analysis and 202 phrases/items generated that differentiated significantly between the poor and good Integrity population – reduced through a process of expert interpretation and combination to 200 (which represents the items the test consists of).

6. Each one of these items proved, on independent analysis, to differentiate significantly relative to an integrity-related criteria – as described above.

7. A Factor Analysis of the 200 selected and differentiating anecdotes, produced 40 identifiable groups and a further Factor Analysis of the latter 40 Areas, produced 8 groups; representing the 8 Substructures the IP-test consists of.

8. During the statistical analysis steps of developing and introducing the IP200 only valid and reliable information was used, i.e. only the ‘test’-results produced by testees comprehending the test and not trying to manipulate the outcome thereof were used – i.e. a Consistency score of above a 6-sten, and a Lie-Detector of above a 7 sten.

9. In line with the so-called ‘Full Service Policy’, it was decided to reflect the 4 items with the highest Face Validity as the so-called ‘Loading Items/Factors under each Area in the Categorization Document to promote the effectiveness of the counseling function amongst the users of the instrument – notwithstanding the ‘statistical results’.

10.Considering the knowledge that all the test-items differentiate significantly on the Integrity Construct, the RRA (Replacement Regression Analysis) was conducted on all the 170 items to ensure an optimal loading on each Area.

11. The above introduce the Multiple Order Approach (MOA) of selecting and loading of items on specific Areas to be assessed – according to which each of the so-called four loading Items functions as a multiple unit of items, each contributing different values/degrees relative to its ‘true’ component relative to the particular observation (item) and the degree it succeeds in declaring the variance of the particular construct/area it loads on.

12.The MOA used during the development of the IP200, impacted significantly on Step 5 of the Development Process, (i.e. the conducting of an Item Analysis to determine whether ‘each item’ contribute properly to the total score) in that each so-called Loading Item consists of an integrated multiple set/unit of items, optimized in terms of eliminating (by matter of speech) the Error Component of each of the relevant item/observation – to a maximum of three observations/items per Loading Item.

13. The experimental instrument (IP200 test) was applied in practice to evaluate its consistency and accuracy (i.e. the reliability and validity), and finally it was submitted to the HPCSA for registration.”


  3. TECHNICAL MANUAL

The Technical Manual of most of the tests represents rather hefty documents, covering the information, summarized in this paper, in such detail and run into hundreds of pages. The technical research part thereof is covered in the Training Manual and it is suggested that it rather forms part of a fully fletch training endeavor. It is, nevertheless, available on request.


 4. NORMS

The developers of the INTEG measuring instruments are using, in line with the generally accepted international practice, the integrated concept of norming the results in order to compose a unitary norm for all their products. This concept is derived from the so-called ‘Integrated Multiple Composed Norm (IMCN)’ that was initiated by the European Psychometric Convention (EPC) of 1994.

The six steps, to accomplish the above, involve a multiple, involved and longitudinal process, but it is justified considering the obvious benefits to be derived from the results.

The above does not eliminates the individual Norming Process that is still embarked and reported on in the Technical Manual of each test. This is an ongoing process depending on different populations, circumstances and intended uses of the test in question.


5. PSYCHOMETRIC PROPERTIES

The models used to determine the psychometric properties of the INTEG-measuring instruments forms the subject of this generalized introduction and the results obtained for each test, are reported in general terms, when the particular test is presented in this paper.

5.1     Reliability

Reliability is a measure of the consistency with which a measuring instrument measures. Reliability is thus the consistency with which a measure/test achieves the same results under different conditions. If a low degree of consistency is achieved by a measure, it is uncertain whether anything of substance is really measured by the particular instrument. This is the first, primary and acid ‘test’ an instrument must pass in terms of the successive hurdles of psychometric properties an accurate, successful and effective psychometric test must adhere to in a statistical and practical sense.

With this as a general background, the following reliability models were applied in the development of each INTEG-test:

5.1.1     Coefficient of Internal Consistency where a split-half approach of the test’s items were used to determine how consistent the instrument is in an internal sense.

5.1.2   Coefficiency of Stability where a test-retest approach was used to determine the reliability/consistency of the instrument when applied to the same group of people or two or more occasions – how stable is this test over time.

5.2       Validity

The validity of a measure is the extent to which the instrument measures what it claims, or is supposed to measure/test. In other words, validity is concerned with the extent to which the measure is free of irrelevant or contaminating influences. Validity is thus the ratio of the relevant score to the total or observed score. Therefore the larger the irrelevant component, the lower the validity. Another name for this irrelevant component is ‘bias’. Logically this leads to the conclusion that the validity of an instrument cannot be greater than its reliability – justifying the primary importance placed on the concept/property of reliability above.

With this as a general background, the following validity models were applied in the development of each INTEG-test – all of which are important, although they apply differently in different contexts and therefore require different kinds of evidence:

5.2.1    Construct Validity

Determining the extent to which the instrument produces results that are in line with what is already known in the particular field of study. A proven and popular approach is to use the Discriminant Validity in this instance - not correlating with measures known to be independent therefrom. Similarly Factor Analysis is used to determine the extent to which the particular instrument is utilizing a similar factorial structure present in other techniques/tests of the same (or related) construct.

5.2.2    Content Validity

Determining to what extent the context of the instrument accurately reflects the domain it assesses.

5.2.3    Criterion-Related Validity

Determining to what extent the results generated by the instrument relate to some (sound, reliable and valid) external criterion of success in the particular field. In this area there are the following two forms of criterion – related validity modules, namely:

Concurrent Validity

-          determining the extent to which the instrument successfully distinguishes between known groups relative to the criterion of success.

Predictive Validity

-          determining the extent to which the instrument successfully predicts how (unknown) groups may differ in the future regarding the select criterion of success.

5.2.4    Face Validity

Part of Content Validity, is the notion of Face Validity. Determining the extent to which the instrument appears (especially to the uninformed) to be doing what it claims to be doing – i.e. does the instrument, and the items it consist of, seem to be appropriate?

5.3        Bias, Fairness & Discrimination

Bias can best be described as the systematic error in measurement or research that affects one group (e.g. race, age, gender, etc.). more than another. Unlike random error, bias can be controlled for.

Fairness on the other hand, is the extent to which assessment outcomes are used in a way that does not discriminate against particular individuals or groups. It is clear that a commonality exists between the above two and in the development of the INTEG-test the so-called

Norming Process’ was applied in a (statistical) practical approach where a wide variety of factors that are ‘known’ to be ‘sensitive’ to the concept of bias, fairness and/or discrimination (like age, gender, ethnicity, language, etc.) were sub-divided into two categories each (like young and old) and the results of the test correlated with a multi-dimensional (external) success-criterion.

If the obtained set of correlations differ to a significant degree for a particular sub-divided group, the probability for the instrument to measure/predict unfairly on the specific factor (it is sub-divided on), is considered to be good/strong. The opposite is also true. The model used is commonly known as the ‘Sub-Division Norming Process.

5.4        Readability

Although language per sé is not categorized as a psychometric property (except for been known as a ‘sensitive factor’ in terms of the concept of fairness), it can play a determining role in test-administration and interpretation. Other than using language experts and doing practical trail-runs with the particular test with the purpose of minimizing the differential effect of the language used in the test (e.g. to the total elimination of verbal test-items in the COPAS), the Fry Readability Graph was used in the development of the test to ensure that the language used was at a low ‘complexity’ level – and of course to always include Language as a ‘known sensitive factor’ in the ‘Sub-Division Norming Process’ during the seventh and last step in the Test Development Model. Attention is given on a continuous basis to the language-issue in verbally/text-based tests – statistical analysis of all items are performed and feedback is gathered from users of the test in different situations, to ensure that items, words and sentences in the test are properly comprehended and serve their intended purpose.


In summary, the following seven actions are taken to ensure effective and optimal ‘Readability’ in text-based tests:

-            Using Language Experts to formulate texts during test-development.

-            Applying the Fry Readability Graph during test-development.

-            Using ‘Language’ as a given ‘critical/sensitive’ factor in the ‘Sub-Division Norming Process’ during the Evaluation of the Test in last (7th) step of the Test Development Model.

-            Perform continuous statistical analysis on items used in text-based tests.

-            Gather and implement ‘post-mortem’-information on tests used in practice – especially when tests are applied for the first time to particular groups and under specific circumstances or conditions.

-            Translating tests when necessary.


 6. INTEG PRODUCTS

INTEG possesses a wide range of products of which sixteen are dedicated in its field of speciality, namely Integrity and fourteen on other fields. The following five tests represent the full arsenal of Integrity psychometric instruments together with the purpose it serves:

 INTEGRITY TESTS

DEDICATED INTEGRITY TESTS

Measuring Instruments –       Seven Psychometric Tests                             Specialist Purpose

  • IP200–                             Integrity Profile-200                                     Diagnostic Integrity
  • IMI –                                Integrity Measuring Instrument                 Screening/Selection
  • IP:Culteg –                     Integrity Profile : Culture of Integrity         Development
  • BIP –                               Basic Integrity Profile                                  Shortlisting
  • GIP –                               General Integrity Profile                              General  (Non-Work)
  • OCB –                             Organizational Citizenship Behaviour       Disposition to assisting  co-workers  & the organisation  
  • CWB –                             Counterproductive Work Behaviour          Disposition to counterproductive work behviour in general

Measuring Instruments –

Supportive Instruments                   Training and Development Material

  • Integrity Questionnaire.                        ● Nine Integrity Training Modules.
  • Integrity Structured Interview.                ● Nine Integrity Posters.
  • Integrity In-basket.                               ● Integrity Instructor Manuals.
  • Integrity Assessment Centre.               ● Culture of Integrity Model.

Reports                                            ● Certification Process and Certificates.

  • Five Structured Integrity Summarised Reports.
  • Seven Personalised and Interpretive Reports, i.e., five on the above five Summarized Reports, one on the IP:Culteg and one on the general training status and results.

OTHER PSYCHOMETRIC TESTS

In addition to these sixteen products that are dedicated to Integrity only, INTEG is also offering products that are making a further contribution to assessing Integrity from different angles, as indicated by the specific subjects they are dedicated to, as mentioned below:

Purpose/Type

PsycoTests3

Measuring Instruments –

Supportive Instruments                      Training &

                                                            Development Material

As an international leader in

the assessment-field,                                 ●Wide range of Trainin

INTEG conducts the following

four ‘standardised’                                      Development Modules

ASSESSMENT CENTRES (AC)

  • One-Day AC,    
  • Standard 1½ day AC,       
  • Flagship 2½day AC &
  • Executive 3½ day Assessment & Development Centre (ADC)

All four ACs include THE TROIKA as psychometric ‘tools’ - i.e.,

the IP200, COPAS & PAW.

The ACs includes a Capita Selecta of the following ‘Life’ Assessment Exercises: Case Studies,   In-Baskets Formal Presentations,   Group Discussions,   Role Plays &     Self and Collegial Assessments.

Reports

  • Standardised AC-reports for Corporate and Individual feedback.
  • Standardised Feedback Reports on psychometric tests

A total of nineteen psychometric tests!

MORE ABOUT THE TESTS DEDICATED TO INTEGRITY

Research has shown that the concept of integrity is a wide and complex one that requires the covering of a variety of sub-spheres it consists of, in order to declare the total variance thereof statistically effectively and to an significant level – i.e., that ± 30 measuring-items will, at best, cover/declare only approximately 50% of the variance, 50 items 60%, 100 items 75%, 200 items 88% and 400 items about 93%, etc. This background information about the subject-field to be measured, will provide more appreciation regarding the nature and ‘length’ of the following tests that are covering all the angles and purposes of Integrity in practical terms.

Copyright (c) Integrity International 2015. All rights reserved.
Designed by joomla2you.com