Community Health Survey Analytic Considerations

Be mindful of the following analytic considerations for the Community Health Survey (CHS):

  • Items on the CHS change as the Health Department develops initiatives or new public health issues emerge. The question matrix (PDF) lists the topics asked each year.

  • Although a topic may be included every year, question wording or response categories may vary from one year to the next. These changes may be minor or substantial, and comparing estimates across years may not always be appropriate. The CHS variable crosswalk (EXCEL) can be used as a guide to assess comparability. If you need guidance about comparing questions across years, email

  • Some conditions may be rare, or the sample sizes for some populations quite small, making CHS estimates potentially unreliable. Organizations have various guidelines for when an estimate should be considered unreliable. Our suggested guidelines for CHS data reliability (PDF) incorporate relative standard error (RSE), confidence interval width and sample size.

  • Some variables are not included in the datasets available here but may be requested through a Data Use Agreement. If you would like datasets that contain variables noted in the codebooks as available with a Data Use Agreement, email

  • Analyses of smoking data for 2003 should use the special smoking dataset (CHS 2003 smoking).

  • CHS began including a cell-phone-only sample in 2009. For most measures, the inclusion of a cell-phone-only sample has had only a nominal effect (see Epi Research Report — Results from the 2008 Cell Phone Pilot Study (PDF)). However, researchers should use caution when comparing multiple years of CHS data and note any difference in the populations surveyed.

  • Most variables in the CHS have very few missing values. Responses of "don’t know" and "refused" are coded as missing (.d and .r, respectively). In select cases, responses of "don’t know" are coded as a non-missing response category because these responses were intended to be a valid response category; they comprise more than 10% of responses; or to maintain historical consistency. By default, SAS and SUDAAN exclude missing values from analysis. In recent years, the Health Department has provided imputed versions of some variables, as noted in the codebooks. If you are interested in using methods such as imputation to address missing data, email

  • Age adjustment can be used to compare prevalence estimates: 1) between NYC and other jurisdictions; 2) between groups within NYC; or 3) over a period of time. It can be used to estimate what the prevalence would be if the age distribution of two populations were the same. See additional standard age adjustment weights (SAS).

  • If an analysis requires the combination of multiple years of data, a multi-year weight is needed. Multi-year weights are only available for data years 2002-2008, and for select two-, three- and five-year combinations from 2009-2019. Survey data from 2009 and after cannot be combined with earlier years. For more information on multi-year survey weights, email