*Call in the CHS 2010. Working data set is called chs2010 * *This program provides sample code to use when analyzing survey data * *There are 8665 observations and 188 variables in the dataset * The stratification (nesting) variable is strata The survey weight variable is wt11_dual For more information, please contact: NYC Department of Health & Mental Hygiene Bureau of Epidemiology Services EpiDatarequest@health.nyc.gov *********************************************************************; /*enter in the pathway where dataset and format programs are stored*/ libname intdat 'x'; filename formatin 'x\formatstatements_chs2010_public.sas'; %include 'x\formats_chs2010_public.sas'; data chs2010; set intdat.chs2010_public; run; proc contents data=chs2010; run; /********Instructions for analyzing CHS 2010 data***************** Survey data needs to be analyzed using a special procedure in SAS -- proc surveymeans - or using SUDAAN or another software package that can handle complex survey designs. If you are only interested in point estimates (i.e. you do not need standard errors/confidence intervals) using the weight option in regular SAS procedures, will give the correct point estimates. However, in order to get confidence intervals you must use proc surveymeans or another software program capable of accounting for the complex survey design. ************************************************************************************/ **Sample code: Standard errors will not be correct with regular SAS procs, but point estimates will be fine. Remember to use the weight statement; proc freq data = chs2010; tables sex*(smoker generalhealth); weight wt11_dual; run; **Sample code for proc surveymeans - standard errors are correct. Same point estimates as code above; proc surveymeans data = chs2010 nobs mean clm sum std clsum ; strata strata; *survey design information; weight wt11_dual; *weight statement; var smoker generalhealth; *variables you are interested in analyzing; class smoker generalhealth; *all variables in var statement that are categorical; domain sex; *variable to see estimates stratified by; run; **Sample code for SUDAAN, proc descript**; /*MUST SORT DATA BY STRATIFICATION VARIABLE FIRST*/ proc sort data=chs2010; by strata; run; /*NOW RUN PROC DESCRIPT*/ proc descript data=chs2010 filetype=sas design=strwr; nest strata; *survey strata variables*; weight wt11_dual; *survey weight variable*; var smoker smoker smoker generalhealth generalhealth generalhealth generalhealth generalhealth; *variables you are interested in analyzing; ; catlevel 1 2 3 1 2 3 4 5; *specify the levels of each variable you want *; tables _one_ sex; *_one_ will give you the overall total for each variable: sex will produce the gender-specific estimates*; subgroup _one_ sex agegroup; *all variables on the tables statement must also be in the subgroup statement. agegroup is needed for age-adjustment*; levels 1 2 4; *specify the levels of the variables above*; /*for age-adjustment of estimate: use the US 2000 Standard Population*/ stdvar agegroup; stdwgt 0.128810 0.401725 0.299194 0.170271;/*These weights are for agegroup total: different age adjustment weights are needed for variables that use other agegroups*/ setenv decwidth=1; /*Produce output with results rounded to 1 decimal place*/ print/style=nchs; *will print the results*; output/filename=output10 filetype=sas tablecell=default replace; *produces an output dataset of results*; title1 'Prevalence of Smoking Status and General Health Status, by Gender: CHS2010'; run; /*Compute the relative standard error of the estimates: Estimates with RSE >=0.30 or sample sizes <50 are considered unstable: http://www1.nyc.gov/assets/doh/downloads/pdf/episrv/bes_data_reliability.pdf */ data rsecheck; set output10; *use the output dataset created from the proc descript*; if percent in (0.00, 100.00) then do; if nsum >= 50 then flag = '**'; if nsum < 50 then flag = '^'; end; else if percent not in (0.00, 100.00) then do; rse = sepercent/percent; ciband = uppct-lowpct; halfw = ciband/2; if sepercent = 0.0 and ciband = 0.0 then flag='^'; else if rse =>0.5 then do; if ciband >=6 then flag='^'; else if ciband < 6 then flag = '*'; end; else if rse < 0.3 then do; if nsum <50 then flag='*'; else if nsum >= 50 then do; if halfw > 10 then flag = '*'; end; end; else if 0.5 > rse >=0.3 then flag='*'; end; run; options ls = 150; proc print data = rsecheck noobs; var flag rse nsum ciband halfw percent lowpct uppct; where flag in ('*','^', '**'); run; /*For more details on age-adjustment, see: Klein RJ, Schoenborn CA. Age adjustment using the 2000 projected U.S. population. Healthy People Statistical Notes, no. 20. Hyattsville,Maryland: National Center for Health Statistics. January 2001. http://www.cdc.gov/nchs/data/statnt/statnt20.pdf */