Reliability and Validity Tests: A Complete Guide for Kenyan Postgraduate Students | Tobit Research Consulting
What you will learn: The fundamental difference between reliability and validity — and why you need both; the four types of validity that Kenyan university panels assess and how each is established; how to conduct a pilot study correctly and use it to test your instruments before main data collection; how to calculate and interpret Cronbach’s Alpha in SPSS including item-total correlations and the “Alpha if Item Deleted” diagnostic; when a low alpha should lead to item removal and when it should not; how to calculate the Content Validity Index (CVI) using expert ratings; how construct validity is established through factor analysis and convergent and discriminant evidence; how to report reliability and validity in your Chapter 3 and Chapter 4; and the exact panel questions your examiners will ask about your instruments’ quality.

Reliability and validity are the twin foundations of measurement quality in quantitative research. A questionnaire or other data collection instrument that is neither reliable nor valid produces data that cannot meaningfully answer your research questions — no matter how sophisticated the statistical analysis applied to it. Yet in Kenyan postgraduate research, the treatment of reliability and validity in Chapter 3 is one of the sections most frequently written superficially: students name the concept, cite a textbook definition, state a Cronbach’s Alpha threshold, and move on — without demonstrating that they actually understand what they are measuring or how to establish it systematically.

At Tobit Research Consulting, we work with Masters and PhD students across KU, UoN, JKUAT, MKU, Strathmore, Laikipia, Egerton, Moi, and many other Kenyan universities. Reliability and validity questions are among the most consistent panel corrections we help students address — not because the concepts are arcane, but because the standard required goes well beyond mentioning Cronbach’s Alpha. This guide gives you the complete framework your panel expects to see.


1. Reliability vs. Validity: The Fundamental Distinction Every Postgraduate Student Must Own

Reliability and validity are related but distinct properties of a measurement instrument, and confusing them — or using the terms interchangeably — is a mistake that panels at every Kenyan university will catch. The distinction is both conceptual and practical, and you must be able to state it precisely without notes.

Reliability refers to consistency: a reliable instrument produces the same or very similar results when administered to the same respondents under the same conditions at different points in time, or when different trained raters use it to assess the same phenomenon. Reliability is about the stability and repeatability of measurement. Think of a weighing scale that gives you the same reading every time you step on it — that scale is reliable. Whether it is correctly calibrated to give you your true weight is a separate question.

Validity refers to accuracy: a valid instrument measures what it claims to measure. A scale that consistently reads 5 kg heavier than your true weight is perfectly reliable — it gives you the same wrong answer every time — but it is not valid. In research terms, an instrument can be reliable without being valid, but it cannot be valid without also being reliable. Validity is the higher standard: it requires that the instrument both measures consistently and measures the right thing.

The panel test: If an examiner asks “What is the difference between reliability and validity?”, the answer that passes is: “Reliability is about consistency — does the instrument produce the same results when conditions are repeated? Validity is about accuracy — does the instrument actually measure the construct it is supposed to measure? An instrument can be reliable without being valid, but a valid instrument must also be reliable.” The analogy of an archer hitting the same wrong spot consistently (reliable but not valid) versus hitting the target accurately (both) is widely cited and acceptable to use.


2. The Four Types of Validity — and How Each Is Established in Kenyan Postgraduate Research

Chapter 3 — Validity Framework

Validity is not a single property that a questionnaire either has or lacks — it is a multi-dimensional concept encompassing several distinct types, each of which must be addressed in your Chapter 3 methodology. Kenyan university guidelines at KU, UoN, JKUAT, and MKU consistently require that proposals and dissertations address at minimum face validity, content validity, and construct validity. The following four types are the standard framework in Kenyan postgraduate quantitative research.

Validity Type 1
Face Validity — Does the Instrument Appear to Measure What It Claims?

Face validity is the most basic and least technically rigorous form of validity — it is the extent to which the instrument appears, on its face and to a non-expert reviewer, to be measuring the intended construct. It is established through a review of the instrument by subject matter experts or by members of the target population who assess whether the items seem relevant and appropriate. Face validity does not confirm that the instrument actually measures what it claims, but its absence — items that look irrelevant, confusingly worded, or clearly misaligned with the stated construct — is a signal that more fundamental problems exist. In Kenyan dissertations, face validity is typically established during the pilot study phase, when supervisors, subject experts, and sometimes a small group of target respondents review the questionnaire for clarity and apparent relevance before administration.

Validity Type 2
Content Validity — Does the Instrument Adequately Cover the Domain It Claims to Measure?

Content validity assesses whether the items in an instrument collectively and proportionately represent the full domain of the construct being measured. It is established through systematic expert review — not just a casual read-through, but a structured rating process in which subject matter experts assess each item for relevance, clarity, and representativeness, and their ratings are quantified using the Content Validity Index (CVI). Content validity is essential for any instrument that claims to comprehensively measure a construct: if your questionnaire claims to measure “supply chain resilience” but only has items about supplier relationships and ignores logistics flexibility, disaster recovery, and demand uncertainty, it has poor content validity regardless of how high its Cronbach’s Alpha is.

Validity Type 3
Construct Validity — Does the Instrument Measure the Theoretical Construct It Represents?

Construct validity is the broadest and most theoretically demanding form of validity. It assesses whether the instrument actually measures the underlying theoretical construct it is intended to represent — and whether it measures that construct specifically, rather than related but distinct constructs. Construct validity has two components that are both expected in PhD-level research and increasingly required in Masters work at leading Kenyan institutions: convergent validity (items or subscales intended to measure the same construct correlate strongly with each other) and discriminant validity (items intended to measure different constructs do not correlate so highly that the constructs cannot be distinguished). Construct validity is established primarily through factor analysis — both Exploratory Factor Analysis (EFA) and, at a more advanced level, Confirmatory Factor Analysis (CFA).

Validity Type 4
Criterion Validity — Does the Instrument Predict an Outcome It Should Predict?

Criterion validity assesses whether scores on the instrument are related to scores on an established external criterion — either measured at the same time (concurrent validity) or in the future (predictive validity). For example, if a new instrument measuring financial management practices is valid, firms with high scores on it should also show better audited financial performance (concurrent) or better future profitability (predictive). Criterion validity is less commonly reported in Kenyan Masters dissertations but is standard at PhD level for newly developed instruments. It requires that an established, validated external criterion measure exists for comparison — which is not always the case in emerging or under-studied research contexts in Kenya.


3. Content Validity and the Content Validity Index (CVI): The Expert Panel Method

The Content Validity Index (CVI) is the standard quantitative method for establishing content validity in Kenyan postgraduate research. It was developed by Lynn (1986) and operationalised by Polit and Beck (2006) and involves having a panel of subject matter experts rate each item in your instrument on a four-point relevance scale, then computing the proportion of experts who agreed that each item is relevant. The resulting statistics — the Item-level CVI (I-CVI) and the Scale-level CVI (S-CVI) — provide objective, defensible evidence of content validity that satisfies the requirements of Kenyan university panels far more convincingly than a qualitative statement that “experts reviewed the instrument.”

How to Calculate the Content Validity Index

  1. Assemble your expert panel. A minimum of three to five experts with subject matter expertise in the research area is the accepted standard at Kenyan universities. Experts should have relevant academic qualifications (typically a Masters degree or above) and knowledge of the construct being measured. For a study on strategic management in Kenyan public institutions, appropriate experts might include senior academic staff in strategic management, experienced practitioners in public sector management, and a methodologist familiar with measurement.
  2. Distribute the instrument with a structured rating form. Alongside your questionnaire, provide experts with a rating form asking them to assess each item on a 4-point scale: 1 = Not relevant; 2 = Somewhat relevant (needs major revision); 3 = Relevant (needs minor revision); 4 = Very relevant. Ask them also to rate each item for clarity using the same scale. Provide space for written comments on specific wording improvements.
  3. Calculate the Item-level CVI (I-CVI) for each item. The I-CVI for each item is calculated as the number of experts rating the item 3 or 4 (relevant or very relevant) divided by the total number of experts: I-CVI = (Number of experts rating item 3 or 4) ÷ (Total number of experts). With five experts, an item on which four rate it as relevant gives I-CVI = 4/5 = 0.80. The accepted minimum threshold for I-CVI is 0.78 (Lynn, 1986) for panels of five or more experts, and 0.80 for panels of three to four. Items below this threshold should be revised or removed.
  4. Calculate the Scale-level CVI (S-CVI/Ave). The S-CVI/Ave is the average of all item-level CVIs: S-CVI/Ave = Sum of all I-CVIs ÷ Total number of items. An S-CVI/Ave of 0.90 or above is considered excellent content validity (Polit & Beck, 2006); values between 0.80 and 0.89 are acceptable. Values below 0.80 indicate that the instrument as a whole does not adequately represent the content domain and requires substantial revision.
  5. Revise items based on expert feedback. Items with I-CVI below threshold should be revised based on the written comments from experts, not simply deleted. After revision, the affected items should be circulated to experts for a second rating round if significant changes were made. Document the revision process — panels will ask about it.
0.78 Minimum I-CVI per item acceptable for panels of 5+ experts (Lynn, 1986)
0.90 Excellent S-CVI/Ave threshold for scale-level content validity (Polit & Beck, 2006)
3–5 Recommended number of expert raters for CVI calculation in Kenyan postgraduate research
✍️ CVI Reporting — Chapter 3 Template

“Content validity was established through a structured expert panel review using the Content Validity Index (CVI) procedure (Lynn, 1986; Polit & Beck, 2006). Five subject matter experts — three academics specialising in strategic management and two senior public sector practitioners — independently rated each item in the questionnaire on a four-point relevance scale (1 = not relevant to 4 = very relevant). Item-level CVIs (I-CVIs) were computed as the proportion of experts rating each item as relevant (3) or very relevant (4). Items with I-CVIs below the accepted threshold of 0.78 (Lynn, 1986) were revised based on expert feedback. The resulting I-CVIs ranged from 0.80 to 1.00, and the scale-level CVI (S-CVI/Ave) was 0.93, indicating excellent content validity for the instrument (Polit & Beck, 2006).”


4. The Pilot Study: How to Conduct It, What to Test, and How to Report It

A pilot study is a small-scale preliminary administration of your data collection instrument to a subset of the target population, conducted before the main data collection exercise. Its purpose is to identify problems with instrument wording, length, sequencing, or comprehensibility; to test the practical logistics of data collection; and — critically for quantitative studies — to compute preliminary reliability statistics (Cronbach’s Alpha) that demonstrate your instrument is internally consistent before you use it on your full sample.

🇰🇪 The Kenyan University Standard for Pilot Studies

At Kenyan universities including Kenyatta University, JKUAT, MKU, and Laikipia University, the standard guidance is that the pilot study sample should be drawn from the same population as the main study but should not overlap with the main study sample — pilot respondents are excluded from the final analysis. The most widely cited minimum pilot sample in Kenyan postgraduate guidelines is 10% of the main sample size, with a common practical floor of 30 respondents (Field, 2018; Mugenda & Mugenda, 2003) — enough to produce stable reliability estimates without being so large that it significantly depletes the pool of accessible respondents. Some Kenyan universities explicitly state 30 respondents as the pilot study standard in their guidelines; always check your institution’s specific requirement.

What to Test in the Pilot Study

Pilot Test 1
Instrument Clarity and Comprehensibility

After administering the questionnaire, ask pilot respondents to identify any items they found unclear, ambiguous, double-barrelled (asking about two things simultaneously), or offensive. Note items where respondents took significantly longer than expected, left blanks, or asked for clarification during administration. Revise all problematic items before the main data collection. The goal is to ensure that every respondent in the main study understands each item in the way it was intended — a questionnaire that means different things to different respondents produces unreliable data regardless of how high the Cronbach’s Alpha appears to be.

Pilot Test 2
Internal Consistency Reliability (Cronbach’s Alpha)

Enter the pilot data into SPSS and run a reliability analysis for each subscale in your instrument. The Cronbach’s Alpha values from the pilot study are what you report in Chapter 3 as evidence that your instrument is reliable before main data collection begins. If any subscale produces an alpha below the accepted threshold of 0.70 (Nunnally, 1978), investigate the item-total correlations and “Alpha if Item Deleted” statistics to identify underperforming items. Revise or remove items as appropriate, then re-test if significant changes were made. Report both the pre-revision and post-revision alpha values where items were removed, with justification.

Pilot Test 3
Administration Logistics

Record how long the questionnaire takes to complete, note any challenges in accessing respondents, and identify any logistical issues with the data collection procedure. If the instrument takes significantly longer than anticipated, consider whether items can be reduced without compromising coverage. If access to respondents was difficult, revisit your data collection procedure and permissions chain. These practical findings should be briefly noted in Chapter 3 under the Data Collection Procedure sub-section.


5. Cronbach’s Alpha: Running It in SPSS, Reading Every Table, and Knowing When to Remove Items

Cronbach’s Alpha (α), developed by Lee Cronbach in 1951, is the most widely used measure of internal consistency reliability in social science and postgraduate research. It estimates the average correlation among all pairs of items in a scale and expresses this as a coefficient ranging from 0 to 1. An alpha of 1.0 would indicate perfect internal consistency — all items are perfectly correlated with each other. An alpha of 0.0 indicates complete absence of consistency. In practice, alpha values between 0.70 and 0.95 are considered acceptable to excellent for research instruments in postgraduate studies (Nunnally & Bernstein, 1994; George & Mallery, 2010).

The Cronbach’s Alpha Interpretation Scale

Alpha Value Interpretation Implication for Your Study
α ≥ 0.90 Excellent internal consistency Strong evidence of reliability; report and proceed. Note: values above 0.95 may indicate item redundancy — items may be too similar to each other.
0.80 ≤ α < 0.90 Good internal consistency Solid reliability for a research instrument. Report and proceed without qualification.
0.70 ≤ α < 0.80 Acceptable internal consistency Meets the minimum threshold for most Kenyan university research contexts. Report and proceed; the standard is satisfied.
0.60 ≤ α < 0.70 Questionable internal consistency Below the recommended threshold. Investigate item-total correlations for weak items. Revise or remove items with corrected item-total correlations below 0.30. Report the steps taken to address the issue.
α < 0.60 Poor internal consistency Significant reliability problem. Return to item review, consider whether the scale items genuinely measure the same construct, and revise substantially before proceeding to main data collection.

Running Cronbach’s Alpha in SPSS: Step by Step

  1. Go to Analyze → Scale → Reliability Analysis.
  2. Move all items belonging to one subscale into the Items box. Important: run reliability analysis separately for each subscale or construct in your instrument — do not lump all items from different constructs together, as a high overall alpha in that case means nothing theoretically meaningful.
  3. Ensure the Model is set to Alpha (Cronbach’s).
  4. Click Statistics: tick Item, Scale, and Scale if item deleted under Descriptives. Also tick Correlations under Inter-Item. Click Continue.
  5. Click OK. SPSS will produce: the Reliability Statistics table (containing Cronbach’s Alpha and the number of items); the Item Statistics table (means and standard deviations for each item); the Inter-Item Correlation Matrix; and the Item-Total Statistics table (the most important table for diagnosis).
  6. Repeat this procedure for every subscale or construct in your questionnaire. Record the alpha value for each scale in a summary table.

6. Item-Total Correlations and the “Alpha if Item Deleted” Diagnostic

The Item-Total Statistics table produced by SPSS is the diagnostic heart of reliability analysis. It tells you not just whether the scale as a whole is reliable, but which specific items are contributing to or undermining that reliability. Understanding and using this table correctly is what separates a methodologically competent Chapter 3 from one that simply reports a number.

Item-Total Statistics Column 1
Scale Mean if Item Deleted

The mean of the overall scale if this item were removed. Use this column to check for items that shift the mean substantially — this can happen when an item is measuring something systematically different from the rest of the scale, or when a negatively worded item has not been reverse-coded (a common error in Kenyan postgraduate research that produces anomalous statistics throughout the reliability output).

Item-Total Statistics Column 2
Corrected Item-Total Correlation

This is the correlation between the item and the total scale score, corrected for the item’s own contribution to the total (to avoid inflating the correlation). It is the most useful single statistic for identifying problematic items. A Corrected Item-Total Correlation below 0.30 indicates that the item is not measuring the same construct as the rest of the scale and should be investigated for revision or removal. A value below 0.20 is a strong signal for removal. Values above 0.30 — and ideally above 0.40 — indicate that the item is a valid contributor to the scale.

Item-Total Statistics Column 3
Cronbach’s Alpha if Item Deleted

This column shows what the overall Cronbach’s Alpha for the scale would be if each item were deleted. If the “Alpha if Item Deleted” value for a particular item is substantially higher than the current overall alpha, removing that item would improve the scale’s reliability — and the item should be considered for removal. The decision to remove an item should not be made on statistical grounds alone: the item should also be evaluated on theoretical grounds. An item that measures something conceptually important to the construct should be retained even if its deletion would marginally improve alpha — but an item that is both statistically weak (low corrected item-total correlation) and theoretically peripheral has a strong case for removal.

The over-deletion trap: Removing items purely to maximise Cronbach’s Alpha — deleting items one by one until alpha is as high as possible — is methodologically dishonest and will be challenged by your panel if the number of items remaining per subscale becomes very small (fewer than three items per subscale raises serious construct coverage concerns). Item deletion should be purposeful and justified — driven by both statistical weakness and theoretical dispensability. Always report which items were removed, why, and the alpha values before and after removal.

✍️ Item Deletion Reporting — Chapter 3 Template

“Initial reliability analysis of the employee engagement subscale yielded a Cronbach’s Alpha of 0.64 (n = 6 items). Examination of the Item-Total Statistics revealed that Item 3 (‘I am proud of the organisation’s products’) had a Corrected Item-Total Correlation of 0.18, substantially below the accepted threshold of 0.30 (Pallant, 2020), and the ‘Alpha if Item Deleted’ value indicated that removing Item 3 would increase scale reliability to 0.72. Following a review of the item’s theoretical alignment — which suggested it was measuring organisational pride rather than engagement per se — Item 3 was removed from the subscale. The revised five-item employee engagement subscale achieved a Cronbach’s Alpha of 0.72, meeting the accepted reliability threshold (Nunnally & Bernstein, 1994). All subsequent analyses used the revised five-item scale.”


7. Beyond Cronbach’s Alpha: Test-Retest and Inter-Rater Reliability

Cronbach’s Alpha measures only one form of reliability — internal consistency. Depending on your research design and data collection method, your panel may ask about additional forms of reliability that are appropriate to your specific study. Understanding when each is applicable prevents you from being caught by a question you did not anticipate.

Reliability Type 2
Test-Retest Reliability — Stability Across Time

Test-retest reliability assesses whether an instrument produces consistent scores when administered to the same respondents on two separate occasions, with a gap between administrations (typically two to four weeks — long enough that respondents are unlikely to remember their specific answers, but short enough that genuine change in the construct is unlikely to have occurred). It is quantified using the Pearson correlation coefficient (for continuous data) or the Intraclass Correlation Coefficient (ICC) as a more rigorous alternative. Test-retest reliability is most relevant for instruments measuring stable traits (personality, attitudes, cognitive abilities) rather than state variables that legitimately fluctuate over time. In Kenyan postgraduate research, test-retest reliability is more common at PhD level than Masters level, where the time and resource cost of two-round data collection is often prohibitive given tight timelines.

Reliability Type 3
Inter-Rater Reliability — Consistency Across Observers

Inter-rater reliability assesses whether two or more independent raters applying the same instrument to the same subject produce consistent results. It is relevant for studies that involve observation, content analysis, qualitative coding, or any assessment process where human judgment is applied to categorise or score data. Inter-rater reliability for categorical variables is measured using Cohen’s Kappa (κ), which accounts for agreement that would be expected by chance. A Kappa of 0.70 or above is the accepted minimum for good inter-rater agreement (Landis & Koch, 1977). For continuous ratings, the Intraclass Correlation Coefficient (ICC) is preferred. At Kenyan universities, inter-rater reliability is most commonly encountered in mixed-methods studies where qualitative data is being coded systematically, in observation-based studies, and in studies involving clinical or educational assessment.


8. Construct Validity: Convergent and Discriminant Evidence Through Factor Analysis

Construct validity — whether your instrument truly measures the theoretical construct it represents — is established through statistical procedures that go beyond internal consistency. The two most commonly used methods in Kenyan postgraduate research are Exploratory Factor Analysis (EFA) and, at PhD level and in more advanced Masters studies, Confirmatory Factor Analysis (CFA) using SPSS AMOS or R.

Convergent Validity

Convergent validity is evidence that items or subscales intended to measure the same construct are highly correlated with each other. In factor analysis terms, items intended to measure a single construct should load strongly (factor loading ≥ 0.40 is the widely accepted minimum; ≥ 0.50 is preferred) on the same factor and have low loadings on all other factors. High Average Variance Extracted (AVE ≥ 0.50) for a construct in CFA is the formal criterion for convergent validity in structural equation modelling studies. In simpler SPSS-based analyses at Masters level, convergent validity is often established by showing that items within each subscale are substantially intercorrelated (examining the inter-item correlation matrix from the reliability analysis) and that EFA clusters items as theoretically expected.

Discriminant Validity

Discriminant validity is evidence that constructs intended to be distinct are not so highly correlated as to be indistinguishable. If two subscales in your instrument both claim to measure different constructs but correlate at r = .90, it is difficult to argue that they are measuring genuinely different things — discriminant validity is compromised. In factor analysis, good discriminant validity is demonstrated when items do not significantly cross-load on factors other than the one they are supposed to measure (cross-loadings should be below 0.32 to 0.40), and when the correlation between constructs is below 0.85 (Hair et al., 2010).

🇰🇪 What Kenyan University Panels Expect on Construct Validity

At Masters level at KU, UoN, and JKUAT, construct validity is most commonly established through EFA results showing that items load as expected — demonstrating that the factor structure of the instrument corresponds to the theoretical constructs defined in the conceptual framework. At PhD level, CFA using AMOS or R is increasingly expected as the minimum standard, with fit indices (CFI ≥ 0.95, RMSEA ≤ 0.06 by Hu & Bentler’s criteria, or CFI ≥ 0.90, RMSEA ≤ 0.08 by the more lenient standard) reported alongside factor loadings and AVE values. If your study uses existing validated scales from prior research (e.g., established organisational commitment scales, validated technology acceptance instruments), you may reference the original validation studies as evidence of construct validity rather than re-establishing it from scratch — but you should still run reliability analysis on your sample’s data.


9. Reporting Reliability and Validity in Chapter 3 and Chapter 4

Reliability and validity evidence belongs in two places in a Kenyan university dissertation: a prospective declaration in Chapter 3 (explaining how reliability and validity will be established, and reporting pilot study results) and a confirmatory report in Chapter 4 (confirming that the main sample data produced reliability estimates consistent with the pilot). The Chapter 3 section is typically titled “Reliability and Validity of Research Instruments” or “Trustworthiness of Instruments” and appears after the Data Collection Instruments sub-section and before Data Collection Procedure.

✅ Chapter 3 Reliability & Validity — Include This

  • Define both reliability and validity clearly in your own words with citations
  • State which types of validity you are establishing (face, content, construct) and why those types are appropriate for your study
  • Describe the expert panel review process for face and content validity — number of experts, their qualifications, the rating procedure
  • Report the CVI calculation method and the I-CVI and S-CVI values obtained
  • Describe the pilot study: sample size, how it was drawn, what was tested, and what revisions resulted
  • Report the pilot Cronbach’s Alpha for each subscale in a table — with number of items, alpha value, and interpretation
  • State the threshold applied (typically α ≥ 0.70) and the authority cited
  • Note any items removed during piloting with the before-and-after alpha values and the justification

❌ Common Reliability & Validity Errors in Kenyan Dissertations

  • Reporting a single Cronbach’s Alpha for the entire questionnaire rather than separately for each subscale
  • Reporting alpha values without specifying the number of items in each scale
  • Confusing reliability with validity in the write-up — treating Cronbach’s Alpha as evidence of validity
  • Stating “CVI was established” without reporting the numerical values or the expert review process
  • Conducting the pilot study on the same respondents used in the main study — this invalidates both the pilot and main analyses
  • Reporting only the final alpha after items have been removed, without disclosing that items were removed or why
  • Not reporting validity and reliability results in Chapter 4 — only mentioning them once in Chapter 3
  • Applying a threshold of α ≥ 0.60 without citing authority — the standard at most Kenyan institutions is α ≥ 0.70
✍️ Full Reliability Table — Chapter 3 Reporting Template

“Table 3.3 presents the Cronbach’s Alpha coefficients obtained from the pilot study for each subscale of the research instrument. All subscales achieved reliability coefficients at or above the accepted threshold of 0.70 (Nunnally & Bernstein, 1994), indicating adequate internal consistency for all constructs measured by the instrument.”

Table 3.3: Summary of Pilot Study Reliability Statistics

Construct / Subscale No. of Items Cronbach’s Alpha (α) Interpretation
Transformational Leadership 6 0.84 Good
Employee Motivation 5 0.79 Acceptable
Organisational Performance 7 0.87 Good
Organisational Culture 5 0.76 Acceptable
Overall Instrument 23 0.89 Good

Source: Pilot Study Data (2024). n = 30.


10. Panel Questions on Reliability and Validity — and How to Answer Each One

🎓 What Kenyan Panel Reviewers Ask About Reliability and Validity
Panel Question What They Are Testing How to Prepare Your Answer
“What is the difference between reliability and validity?” Whether you understand the fundamental distinction — the most basic question in this area Reliability = consistency (same results under same conditions). Validity = accuracy (measures what it claims to). Reliability is necessary but not sufficient for validity. Use the archer or weighing scale analogy.
“How did you establish the validity of your instrument?” Whether your validity evidence is systematic and documented — not just “experts reviewed it” Name the types of validity you addressed: face validity (expert review), content validity (CVI calculation with specific I-CVI and S-CVI values), and construct validity (factor analysis or inter-subscale correlation patterns). Give specific numbers.
“What is the Content Validity Index and how did you calculate it?” Whether you understand the CVI procedure or only cited it without applying it Explain the four-point rating scale, the I-CVI formula (number of experts rating 3 or 4 ÷ total experts), the 0.78 threshold, and the S-CVI/Ave as the scale-level summary. State your specific values.
“What is Cronbach’s Alpha and what does your value mean?” Whether you understand what alpha measures and can interpret your specific result Alpha measures internal consistency — the average correlation among all items in the scale. State your value for each subscale and the interpretation category. Cite Nunnally (1978) or Nunnally & Bernstein (1994) for the 0.70 threshold.
“Did you remove any items from your questionnaire? Why?” Whether you know how to use the Item-Total Statistics diagnostics — and whether you can justify any changes made Reference the Corrected Item-Total Correlation threshold of 0.30 and the “Alpha if Item Deleted” statistic. Name the specific item(s) removed, their correlation values, the alpha improvement achieved, and the theoretical justification for removal.
“How many respondents were in your pilot study and why that number?” Whether your pilot study met accepted minimum requirements and was methodologically deliberate State the number (typically 30, or 10% of main sample), explain that pilot respondents were excluded from the main study, cite the authority for your sample size choice (e.g., Field, 2018; Mugenda & Mugenda, 2003), and confirm the pilot was drawn from the same population.
“Can Cronbach’s Alpha alone confirm that your instrument is valid?” Whether you understand the scope and limitations of what alpha measures No — Cronbach’s Alpha measures internal consistency (a form of reliability), not validity. A scale can have high internal consistency while measuring the wrong construct entirely. Validity requires additional evidence: content validity (CVI), face validity (expert review), and construct validity (factor analysis or convergent/discriminant evidence).
“What is construct validity and how did you establish it?” Whether you understand the most rigorous and theoretically demanding form of validity Construct validity confirms that the instrument measures the theoretical construct it is supposed to represent. Establish it through EFA (showing items load as theoretically expected) and by demonstrating convergent validity (items within subscales correlate) and discriminant validity (subscales measuring different constructs are distinguishable).

Expert Reliability and Validity Testing Support for Kenyan Masters and PhD Students

At Tobit Research Consulting, we help postgraduate students at KU, UoN, JKUAT, MKU, Strathmore, Laikipia, Egerton, Moi, and all other Kenyan universities establish rigorous, panel-ready reliability and validity evidence for their research instruments. Our services include:

  • Content Validity Index (CVI) calculation and expert panel coordination — I-CVI and S-CVI reporting
  • Pilot study design: sample selection, administration guidance, and results analysis
  • Full Cronbach’s Alpha analysis in SPSS for all subscales, with Item-Total Statistics interpretation
  • Item-deletion decisions: statistical and theoretical justification for any items removed
  • Reliability table preparation in APA 7th edition format for Chapter 3
  • Construct validity analysis using Exploratory Factor Analysis (EFA) in SPSS
  • Confirmatory Factor Analysis (CFA) using SPSS AMOS for PhD-level construct validation
  • Convergent and discriminant validity assessment with AVE, CR, and inter-construct correlation matrices
  • Test-retest reliability computation using Pearson r or ICC for applicable study designs
  • Chapter 3 write-up: complete reliability and validity section in the style required by your institution
  • Panel preparation: coaching on how to answer examiner questions about your measurement quality

Whether you are building a new instrument from scratch, validating an adapted questionnaire, or revisiting reliability evidence after a panel revision, our consultants will help you produce the measurement quality evidence your institution requires.

Book a Free Consultation →

📍 Bruce House, 4th Floor, Nairobi CBD, Kenya  |  Tel: +254 728 430 728  |  tobitresearchconsulting.com


This guide is part of Tobit Research Consulting’s Data Analysis Series for Kenyan postgraduate students. Key methodological sources include: Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334; Nunnally, J.C. & Bernstein, I.H. (1994). Psychometric Theory (3rd ed.), McGraw-Hill; George, D. & Mallery, P. (2010). SPSS for Windows Step by Step, Pearson; Lynn, M.R. (1986). Determination and quantification of content validity. Nursing Research, 35(6), 382–386; Polit, D.F. & Beck, C.T. (2006). The content validity index: Are you sure you know what’s being reported? Research in Nursing & Health, 29(5), 489–497; Pallant, J. (2020). SPSS Survival Manual (7th ed.), McGraw-Hill; Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.), SAGE; Hair, J.F. et al. (2010). Multivariate Data Analysis (7th ed.), Pearson; Landis, J.R. & Koch, G.G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174; and Mugenda, O.M. & Mugenda, A.G. (2003). Research Methods: Quantitative and Qualitative Approaches, ACTS Press.

Contact Us. We are ready to help you!

Let's have a chat