Public Use Data
As part of the Health Department's ongoing commitment to make data from the New York City Community Health Survey (CHS) available for analysis, grant writing, policymaking and program development, we provide downloadable datasets for use by researchers, students, and the public health community.
Datasets are available on this page for each year of the CHS. For more information on the CHS, visit our Methodology page.
Prior to downloading the datasets below, please review the following information:
- Due to the complex sampling design of the CHS, data must be analyzed in a software program capable of handling complex survey data (such as SUDAAN or STATA). Annotated sample code for analysis using SAS and SUDAAN is provided for each year. More information about design statements and nesting variables are located in those programs.
- Specific items on the CHS change from year to year as the Health Department develops initiatives or new public health issues emerge. Please review our brief question matrix (pdf) to determine which year(s) contains your topics of interest.
- Although a topic may be included every year, question wording or response categories may vary from one year to the next. These changes may be minor or substantial, and comparing estimates across years may not always be appropriate. The CHS variable crosswalk (xls) can be used as a guide to assess comparability, but all users are strongly encouraged to review the questionnaires and compare data across years of the CHS with caution. Please send an email to email@example.com if you need guidance about comparing questions across years.
- Some conditions may be rare or the sample sizes for some populations quite small, making CHS estimates potentially unreliable. Organizations have various guidelines for when an estimate should be considered unreliable. Our suggested guidelines for CHS data reliability (PDF) incorporate relative standard error (RSE), confidence interval width, and sample size.
- Some variables are not included in the datasets below, but may be requested through a Data Use Agreement. If you would like datasets that contain variables noted in the codebooks as available with a Data Use Agreement, please send an email to firstname.lastname@example.org.
- For any published analysis using downloadable CHS data, please reference the URL of this webpage, the CHS year(s) analyzed, and the date on which the dataset was downloaded (suggested citation below). On occasion, updates are made to some variables. Would you like to be notified when CHS datasets are released or updated? Please send an email to email@example.com.
- Analyses of smoking data for 2003 should use the special smoking dataset (CHS 2003 smoking) provided below.
- CHS began including a cell-phone-only sample in 2009. For most measures, the inclusion of a cell-phone-only sample has had only a nominal effect (see Epi Research Report - Results from the 2008 Cell Phone Pilot Study (PDF)). However, researchers should use caution when comparing multiple years of CHS data and note any difference in the populations surveyed.
- Most variables in the CHS have very few missing values. Responses of “don’t know” and “refused” are coded as missing (.d and .r, respectively). In select cases, responses of “don’t know” are coded as a non-missing response category because these responses were intended to be a valid response category; they comprise more than 10% of responses; or to maintain historical consistency. By default, SAS and SUDAAN exclude missing values from analysis. If you are interested in using methods such as imputation to address missing data, please send an email to firstname.lastname@example.org (include "imputation" in the subject heading).
- If an analysis requires the combination of multiple years of data, a multi-year weight is needed. Multi-year weights are only available for data years 2002-2008, 2009-2013 and 2010-2016. Survey data from 2009-2012 cannot be combined with earlier years. Please send an email to email@example.com for more information on multi-year survey weights.
- In 2011, the Health Department updated its weighting methodology of the Community Health Survey, consistent with other large state and national surveys. New weighting methods incorporate Census 2010 data and additional demographic characteristics to best represent the population of adult New Yorkers. After analyzing possible effects of these changes, the Health Department found that the updated methodology has minimal or no effect on CHS health estimates and does not impact the interpretation of trends in prevalence (percentages) over time. Full details can be found in the methodology update report (PDF).
- Age adjustment can be used to compare prevalence estimates 1) between NYC and other jurisdictions, 2) between groups within NYC, or 3) over time. It can be used to estimate what the prevalence would be if the age distribution of two populations were the same. Additional standard age adjustment weights can be found here.
CHS Public Use Datasets