The Stark QoL – measuring Quality of Life somewhat differently
Jochen Hardt (hardt at mail dot uni-mainz dot de)
Medical Psychology and Medical Sociology, Clinic for Psychosomatic Medicine and Psychotherapy, Universitätsmedizin, Johannes Gutenberg University, Mainz, Germany
Cite as
Research 2014;1:1065

Objectives: The aim of the present paper is to introduce a questionnaire measuring health-related quality of life which is made up mainly of pictures and has only a minimum of words. It comprises a mental and a physical health component. Methods: A sample of about n = 445 medical students filled out the Stark quality of life questionnaire (QoL) during a seminar in their fourth preclinical semester. Results: The Stark QoL questionnaire shows good psychometric properties. The Mental Component displays a Cronbach's alpha of about .77, the Physical Component of about .74. The time required to fill it out is between three and five minutes. Conclusions: The Stark QoL is a short and easy to apply questionnaire which maybe particularly promising for international research.


Health related quality of life (HRQoL) has become increasingly important in medicine. A look into pubmed reveals that about 2000 articles appear per year that contain the term QoL. Two groups of instruments are in use to measure HRQoL: disease specific ones like the various cancer QoL questionnaires developed by the European Organisation for Research and Treatment of Cancer [1] and generic ones like the Short Form-36 [2]. Both forms of instruments are important. The disease specific ones capture specific symptoms and complaints which patients with a certain disease usually have. The generic ones allow to compare various patient groups among each other and with healthy controls. A variety of instruments for measuring quality of life is available, for an overview see for example [3, 4]. Most of the generic instruments comprise items about physical functioning social relationships and subjective well being.

About 25 years ago, an instrument for measuring health related quality of life (HRQoL) was presented that partly relied on pictures, and partly on text: the Dartmouth COOP Charts [5, 6]. The name is an abbreviation for the questionnaire used in the Dartmouth-Northern New England Primary Care Cooperative Information Project, and it comprises the dimensions Physical Status, Emotional Status, Daily Activities, Social Activities, Social Support, Pain, and Overall Health.They were designed for use in a study on primary care in seven countries, to more efficiently measure the functional status of a large number of patients. At that time, the measure was labeled functional status, not quality of life. However, in addition to physical functioning, the charts contained questions about various other domains, so they covered most of what is today considered HRQoL. Inserting the Dartmouth COOP Charts into a questionnaire booklet made it easier for respondents to fill out. The Dartmouth COOP Charts are still in use [7]. A well known instrument based on the same idea is the distress thermometer, a single item scale represented as a vertical eleven point scale in the shape of a thermometer with the endpoints "no distress" and "extreme distress" [8]. It was developed for and is still mainly utilized in cancer research [9, 10], even if it could be generally applied to any persons.

The Stark QoL questionnaire is based on the idea of the Dartmouth COOP Charts and the distress thermometer. However, it uses more pictures insofar that as much content of the items as possible was transferred into the pictures, leaving only very short text elements in between. Avoiding text altogether proved to be impossible, because the respondents needed to know, for example, whether a certain picture displays something they are able to do or something they would like to be able to do. The idea was that the questionnaire could easily be translated into any language and that even respondents who could not read would able to fill it out if they received some verbal instructions. It is short, usually printed on both sides of a standard page, and has five subscales: Mood, Energy, Social Contact, Availability of Food and Physical Functioning. The first three are represented by one item each, and can be combined into a scale called "Mental Component". Food has only one item and stands alone. However, it seems an important item given that approximately 10% of the world's population is still hungry [11]. "Physical Functioning", also termed "Physical Component" comprises eight items. The time required to answer the Stark QoL questionnaire is between three and five minutes. It is called Stark QoL because the pictures were drawn by a German artist named H. P. Stark ( An initial version was tested on 23 pupils and some of the pictures were changed – this was then the version utilized in the present study. The questionnaire can be viewed at an appendix. The aim of the present paper is to introduce the questionnaire and its psychometric properties.

Participants and Methods

The analysis was performed on a sample of 445 medical students who attended a seminar on statistics in their fourth semester between 2009 and 2012. The Stark QoL was utilized to demonstrate some aspects of item construction. After the lesson, students were asked to leave the sheet containing the Stark QoL with the author in order to perform item analysis. Participation was on a voluntary basis, any student who wanted to take her/his sheet home was free to do so. No names were recorded, nor were class, year, age or gender. All students of the lectures were asked to fill out the questionnaire, no selection was drawn. An estimated proportion of 90% of the students left their sheet after teaching in the class room. These questionnaires are the basis for the present analysis. The remaining 10% of students took their sheets home – they were explicitly told that they are free to do so. The student sample filled out a modified version of the Stark QoL, which had slightly different answering scales with between three and five sub-items for each scale (see Figure 1). The students filled it out during a lesson on statistics and test theory and received information from the author that the somewhat counterintuitive answering scales were needed to calculate estimates about reliability.

The Stark QoL – measuring Quality of Life somewhat differently figure 1
Figure 1. The questionnaire.

Since the data used for this paper were collected for a non-scientific purpose, i.e. teaching in this case, and voluntariness as well as anonymity was fully guaranteed, no ethic vote could be obtained for the study according to German law. The rationale behind this is that such a study does not affect a participant’s interests in any way – asking for informed consent would even nullify the anonymity of the participants [12]. All data were entered into the computer twice using the program epidata [13] to minimize the probability of typing errors.

The Stark QoL

The first item consists of five smileys, at one end is a very happy face, at the other end a very sad one. Normally, probands are asked to check which one best applies to them. The students in the present sample had five categories ("--", "-", "0", "+", "++") under each smiley, and were asked to indicate how often they had felt that way in the past month. To avoid confusion, the five smileys will be called item 1 throughout this paper, and each of the smileys is referred to as item 1a...1e. The second item presents three pictures of a person walking, on the left-hand side the walker is full of energy and on the right he seems to be walking as if depressed (item 2a … c). The third item displays three pictures of a group of five persons each, one white and four grey. The white person symbolizes the proband himself, the grey ones a peer group. On one end, the white person stands in the middle of the group, on the other end alone (item 3a … c). The fourth item shows three pictures of a table with a plate of food, full and rich on one side and very poor on the other (item 4a … c). All items are to be answered by making a cross under the picture that best applies to one’s own situation. As with item 1, the students of the sample here had a five point Likert scale under each picture, and were asked how often the situation presented had occurred in the past month.

Scale and Item mean sd nmis
(I-III) Mental Component 52 8 0
1(I) Mood 55 10 0
1aVery good 71 24 7
1bGood 79 19 11
1cNeutral 56 20 15
1dSad 32 25 13
1eVery sad 13 21 13
2(II) Energy 49 11 0
2aFull of energy 73 21 5
2bLow in energy 62 23 13
2cNo energy 22 24 15
3(III) Contact 50 11 2
3aMany friends 75 22 3
3bOne friend 63 26 13
3cNo friend 19 25 15
4(IV) Food 48 13 0
4aGood 73 27 9
4bPoor 63 28 7
4cVery poor 20 26 15
5(V) Physical Component 80 14  1
5aSweeping up 79 25 13
5bChanging a bulb 76 26 12
5cMoving a table 78 23 13
5eRiding a bicycle 83 23 12
5fJogging 73 26 1
5gLifting a heavy box 71 26 5
5hClosing shoes 93 15 2
5iShopping 84 20 1
Table 1. Item and scale descriptions. Sd: Standard Deviation; nmis: number of missing data.

The fifth item consists of eight pictures displaying various activities and represents a scale for physical functioning (item 5a …h). The pictures show activities like changing a lightbulb in a lamp hanging from the ceiling, riding a bicycle, carrying shoppings, moving a table, tying shoes, etc. Next to each picture, the above mentioned five point Likert scale was displayed, but the instructions for item five differed from those for the previous items. The text reads "I can", and "++" stands for "very well", "+" for "well", "0" for "fairly", "-" for "poorly" and "- -" for very poorly. Probands are asked to indicate how easily they can perform the activity displayed in each picture. This item was not changed from the original version.


All answers were coded between 0 and 100. Zero means "was very rarely the case in the past month" or "very poorly" for items 1-4 and 5, respectively. 100 means "was often the case in the past month" for items 1-4 or "very well" in item 5. Item means, standard deviations and amount of missing data are displayed in Table 1. Scales were constructed using STATA's [14] function "alpha..., gen()" which generates a scale value as long as at least one item is filled out. Negative items were inverted automatically by STATA. Cronbachs' alpha as well as convergent (with its own scale minus the respective item) and discriminant (with a foreign scale) correlations are displayed in Table 2. Table 3 shows the correlations between the subscales/items.

Demographics and applicability of the questionnaire

Even if no demographic data were collected, the students were typically about two thirds female, and between 20 and 22 years old in this semester, none were younger, some were older. Approximately 90% were of German nationality, the remaining 10% came from countries all over the world. There were not many problems or questions when filling out the questionnaire, and the rate of missing data was low – on average 2% per item (range 0.2% – 3.3%). When there were missing data, in about two thirds of the cases only one picture from each item was filled out. Hence, these students may have overheard the instruction that they should use the special answering form. The remaining third of missing data were in the classical form, without any pattern detectable.

Scale and Item Me Mo En Co Fo Ph
(I-III) Mental Component .77
1(I) Mood .70
1aVery good .42 .49 .33 .28 .16 .09
1bGood .27 .32 .17 .13 .04 .09
1cNeutral -.38 -.34 -.32 -.19 -.07 -.05
1dSad -.54 -.59 -.37 -.28 -.20 -.11
1eVery sad -.43 -.44 -.30 -.21 -.19 -.09
2(II) Energy .59
2aFull of energy .38 .40 .42 .20 .15 .19
2bLow in energy -.30 -.20 -.34 -.26 -.09 -.06
2cNo energy -.49 -.42 -.40 -.28 -.18 -.19
3(III) Contact .60
3aMany friends .34 .32 .29 .46 .09 .20
3bOne friend -.21 -.09 -.15 -.33 -.08 -.02
3cNo friend -.47 -.33 -.32 -.44 -.16 -.11
4(IV) Food .68
4aGood .30 .30 .20 .15 .55 .07
4bPoor -.11 -.09 -.08 -.12 -.45 -.01
4cVery poor -.18 -.17 -.11 -.12 -.47 -.06
5(V) Physical Component .74
5aSweeping up .10 .08 .11 .13 .03 .35
5bChanging a bulb .10 .14 .14 .09 .09 .50
5cMoving a table .08 .08 .08 .11 .09 .59
5eRiding a bicycle .15 .10 .10 .18 .00 .44
5fJogging .19 .14 .29 .15 .02 .31
5gLifting a heavy box .17 .16 .18 .16 .16 .53
5hClosing shoes .02 .05 -.03 .02 .05 .33
5iShopping .11 .13 .06 .10 .13 .46
Table 2. Convergent and divergent item correlations. Reading example: Cronbach’s α of dimension (I-III) Mental Component is .77. The corrected convergent correlation, i.e. the correlation to the sum of all other items of the Mental Component, of the first item “Very good mood” is r=.42. Cronbach’s α of dimension (I) Mood is α=.70. The corrected convergent correlation of the first item with Mood is r=.49. It's correlation to (II) Energy is r=.33. Convergent correlations are printed bold, discriminative ones plain. *Me = Mental Component, Mo = Mood, En = Energy, Co = Contact, Fo = Food.
Multitrait scaling

Table 1 shows the mean scores for all items and scales. Concerning the Mental Component, results are surprisingly poor. The mean value is only about 50, indicating that the students were not overly happy at this time. However, looking at the single items shows that those ones that displayed positive emotional states received higher means than the negative ones. Food was not a problem in this sample. Also, the physical component received high scores in this sample of young people (mean about 80). Tying shoes was easy for almost everybody in this sample (mean = 93), lifting a heavy box or changing a bulb was somewhat more difficult (means = 71 and 76). The items of the Physical Component show less missing data than those of items 1 - 4, an effect well known from other HRQoL instruments [15].

Scale reliability

Table 2 shows the reliabilities of the subscales. The Mental and Physical Components display satisfactory values, i.e. Cronbachs' alpha = .77 and .74. However, the three sub-items of the Mental Component show unsatisfactory reliability. A value of alpha = .70 for mood is still acceptable, but values of alpha = .59 for Energy or alpha = .60 for Social Contact are low. Food with an alpha = .68 also does not show good reliability.

Table 2 also shows convergent and discriminative correlations of the items. An item is perfect when it has only high convergent and only low discriminative correlations. Seven of the eight items of the Physical Component show such a pattern: the convergent correlations are all high, and the discriminative ones are low. One item constitutes an exception: "5f Jogging". The correlation to the physical component is relatively low (r = .31), and the discriminative correlation to the scale Energy is relatively close to the convergent one (r = .29). This item will be excluded from the Physical Component in further studies. Within the Mental Component, the pattern is less clear. The items for Mood, Energy and Contact have partly high discriminative correlations. Cosidering them as separate aspects would hardly be suggested by this data. On the other hand, between the Mental and the Physical Components or Food there are no problems with high discriminative correlations.

Table 3 displays the correlations between the subscales. It can be seen that the items of the Mental Component are highly correlated, while the associations to Food and the Physical Component are low.

Dimension Me* Mo En Co Fo
Mood .87
Energy .83 .64
Contact .74 .42 .45
Food .28 .27 .21 .19
Physical C. .27 .20 .25 .18 .13
Table 3. Dimension intercorrelations (Pearson’s r). *Me = Mental Component, Mo = Mood, En = Energy, Co = Contact, Fo = Food.

The Stark QoL is a short and easy to handle questionnaire. Acceptability of the questionnaire was very good, what was reflected in a relatively low number of missing data compared with other studies [15]. It demonstrates good reliability for its two main components: Mental and Physical Quality of Life. The widely used Short Form-36 [2] for example is often reduced to these two dimensions. The reliability of an additional single item for availability of food is almost acceptable. It is possible, however, that the present sample, i.e. German medical students, have a low variability on this item. Almost all of them probably have enough to eat. The three sub-dimensions of the Mental Component should not be analyzed separately – the reliabilities are too low and some discriminative correlations too high. But both components, mental and physical, display sufficient homogeneity. Regarding the Physical Component, the result was surprising, because there is obviously a difference between tying a shoe and changing a light bulb. However, a dimension Physical Functioning showing sufficient reliability can be constructed utilizing seven items. Concerning the Mental Component, homogeneity was not surprising. A person in a good mood would usually be expected to show energy and feel more integrated into the peer group than one who feels sad.

In the Stark QoL, the small number of sub-items probably leads to low reliability for single items. There is a discussion on applying short or even very short versions of scales in various fields today, and there are examples where this has worked rather well [16]. Others still use long scales or instruments with varying length of the scales [17], even when there is high internal consistency [18]. With a three sub-item scale however, it seems critical to reach sufficient reliability, even when the underlying construct is as simple as having enough food or not.

The present study has the following limitations: (1) Data rely on a highly selected sample of convenience, i.e. medical students. (2) This paper reports only on reliability, not validity.

With the given limitations, we believe that the Stark QoL constitutes an interesting alternative to conventional questionnaires for assessing quality of life. It is short and efficiently measures two core dimensions of quality of life: a mental and a physical component. Its translation into other languages should be easy. Currently, evaluation of mental and physical health is particularly lacking in lower or middle income countries [19], where the Stark QoL may come to be of particular value.

Conflict of interest

The author declares no conflict of interest

  1. Group EQoL, Projects: QLQ modules, 2014. Available from:
  2. Newnham E, Harwood K, Page A. Evaluating the clinical significance of responses by psychiatric inpatients to the mental health subscales of the SF-36. J Affect Disord. 2007;98:91-7 pubmed
  3. Macefield R, Jacobs M, Korfage I, Nicklin J, Whistance R, Brookes S, et al. Developing core outcomes sets: methods for identifying and including patient-reported outcomes (PROs). Trials. 2014;15:49 pubmed publisher
  4. Makai P, Brouwer W, Koopmanschap M, Stolk E, Nieboer A. Quality of life instruments for economic evaluations in health and social care for older people: a systematic review. Soc Sci Med. 2014;102:83-93 pubmed publisher
  5. Nelson E, Landgraf J, Hays R, Wasson J, Kirk J. The functional status of patients. How can it be measured in physicians' offices?. Med Care. 1990;28:1111-26 pubmed
  6. Landgraf J, Nelson E. Summary of the WONCA/COOP International Health Assessment Field Trial. The Dartmouth COOP Primary Care Network. Aust Fam Physician. 1992;21:255-7, 260-2, 266-9 pubmed
  7. de Azevedo-Marques J, Zuardi A. COOP/WONCA charts as a screen for mental disorders in primary care. Ann Fam Med. 2011;9:359-65 pubmed publisher
  8. Roth A, Kornblith A, Batel-Copel L, Peabody E, Scher H, Holland J. Rapid screening for psychologic distress in men with prostate carcinoma: a pilot study. Cancer. 1998;82:1904-8 pubmed
  9. Loquai C, Scheurich V, Syring N, Schmidtmann I, Müller-Brenne T, Werner A, et al. Characterizing psychosocial distress in melanoma patients using the expert rating instrument PO-Bado SF. J Eur Acad Dermatol Venereol. 2014;28:1676-84 pubmed publisher
  10. Iskandarsyah A, De Klerk C, Suardi D, Soemitro M, Sadarjoen S, Passchier J. The Distress Thermometer and its validity: a first psychometric study in Indonesian women with breast cancer. PLoS ONE. 2013;8:e56353 pubmed publisher
  11. Bundesministerium_für_Ernährung_und_Landwirtschaft. Welternährung & FAO. 2014. Available from:
  12. Scharnetzky E. Nutzung und Linkage von Routinedaten aus Sicht einer Krankenkasse. 12. Deutscher Kongress für Versorgungsforschung 24. Oktober 2013, Berlin. Available from:
  13. Lauritzen J. epidata. 2007.
  14. StataCorp, Stata Statistical Software: Release 12. 2011, StataCorp LP: College Station, Texas.
  15. Coste J, Quinquis L, Audureau E, Pouchot J. Non response, incomplete and inconsistent responses to self-administered health-related quality of life measures in the general population: patterns, determinants and impact on the validity of estimates - a population-based study in France using the MOS S. Health Qual Life Outcomes. 2013;11:44 pubmed publisher
  16. Vilagut G, Forero C, Pinto-Meza A, Haro J, de Graaf R, Bruffaerts R, et al. The mental component of the short-form 12 health survey (SF-12) as a measure of depressive disorders in the general population: results with three alternative scoring methods. Value Health. 2013;16:564-73 pubmed publisher
  17. Wang S, Wu B, Leng L, Bucala R, Lu L. Validity of LupusQoL-China for the assessment of health related quality of life in Chinese patients with systemic lupus erythematosus. PLoS ONE. 2013;8:e63795 pubmed publisher
  18. Okamoto N, Hisashige A, Tanaka Y, Kurumatani N. Development of the Japanese 15D instrument of health-related quality of life: verification of reliability and validity among elderly people. PLoS ONE. 2013;8:e61721 pubmed publisher
  19. Kularatna S, Whitty J, Johnson N, Scuffham P. Health state valuation in low- and middle-income countries: a systematic review of the literature. Value Health. 2013;16:1091-9 pubmed publisher
ISSN : 2334-1009