*** 1-to-1_mapping_SIC80-SIC92.do *** * Last modified: 17 June 2012 * *** *** NOTES *** *** 4-digit SIC 1980 Activities (312) are mapped onto consistent 2-digit SIC 1992 Divisions (60) and consistent 1-digit SIC 1992 Sections (17). *** 1. A direct correspondence is already provided by the LFS in the Sep-Nov 1993 Seasonal Quarterly dataset, which contains SIC 1980 variables and SIC 1992 Sections. *** 2. This file expands the mapping to include 2-digit SIC 1992 Divisions. It also enables a check by also mapping to 1-digit SIC 1992 Sections. *** *** *** DEFINE DATA AND DEFAULT LOCATIONS, AND LOG FILE *** global rawdatalocate C:\Users\Jennifer Smith\Documents\datasets and programs\LFS global locate C:\Users\Jennifer Smith\Documents\Missing middle display "rawdatalocate is " "$rawdatalocate" display "locate is " "$locate" * log using "1-to-1_mapping_SIC80-SIC92.do", text replace *** *** *** 1. USE SEPTEMBER-NOVEMBER AUTUMN 1993 SEASONAL QUARTERLY DATASET WHICH CONTAINS ALL SIC 1980 VARIABLES AND 1-DIGIT SIC 1992 VARIABLES *** * indmain and indlast are 4-digit level SIC 1980 Activity, coded 1-321 plus 999, with 313-321 being missing values. * sicmain is 1-digit SIC 1992 Section, coded 1-20 with 18-20 being missing values. local getvars "quota week w1yr qrtr add wavfnd hhld recno sicmain siclast indmain indlast" use `getvars' using "$rawdatalocate\qlfssn93.dta", clear * mvdecode. mvdecode indmain indlast, mv(313 314 315 316 317 318 319 320 321 999) mvdecode sicmain siclast, mv(18 19 20) qui gen sic80_4dig = indmain qui replace sic80_4dig = indlast if sic80_4dig==. & indlast~=. label var sic80_4dig "SIC80 4-digit Activity, combining indmain and indlast" qui gen sic92_1dig = sicmain qui replace sic92_1dig = siclast if sic92_1dig==. & siclast~=. label var sic92_1dig "SIC92 1-digit Section, combining sicmain and siclast" qui drop if sic80_4dig==. | sic92_1dig==. keep sic80_4dig sic92_1dig * Check whether the SIC80-SIC92 mapping is 1-to-1 or not. 1-to-1 mapping is indicated by same (uniform) value for sic92_1dig over all cases of sic80_4dig. sort sic80_4dig sic92_1dig qui by sic80_4dig: egen sic92_1dig_mean = mean(sic92_1dig) qui gen sic92_uniformoversic80 = 1 if sic92_1dig_mean==sic92_1dig qui replace sic92_uniformoversic80 = 0 if sic92_1dig_mean~=sic92_1dig save "$locate\qlfs93sn_1-to-1mapping.dta", replace tab sic92_uniformoversic80 * All 1s for sic92_uniformoversic80 indicates that the mapping is indeed 1-to-1. * Therefore it is fine to collapse by sic80_4dig to give a direct 1-to-1 correspondence from each sic80_4dig to a single sic92_1dig. keep sic80_4dig sic92_1dig collapse sic92_1dig, by(sic80_4dig) * Copy to spreadsheet 1-to-1_mapping_SIC80-SIC92_LFS.xlsx, code the recode. * Copy to text file, rearrange. * Copy label text for SIC92 Section using "label list sicmain", copy to spreadsheet, text to columns, copy to text file, rearrange. * Resulting STATA do-files are in8092sm.do and in8092sl.do. *** *** *** 2. USE STAYERS TO BOTH CONFIRM THE COINCIDENT MAPPING IN 1. ABOVE AND ALSO TO MAP 4-DIGIT SIC 1980 ACTIVITY TO 2-DIGIT SIC 1992 DIVISION AS WELL AS 1-DIGIT SIC 1992 SECTION. * Note: Industry one year ago is absent from both datasets. *** Get seasonal datasets relating to 1993q4 to obtain SIC*, which is missing because the SIC classification changed during the calendar quarter (i.e. at the end of the first seasonal quarter in that calendar quarter). *** * Loop around seasonal quarters. global rawdatalocate C:\Users\Jennifer Smith\Documents\datasets and programs\LFS global locate C:\Users\Jennifer Smith\Documents\Missing middle foreach X in 1 2 { if `X'==1 { use quota week w1yr qrtr add wavfnd hhld recno indmain indlast conmpy consey conmon using "$rawdatalocate\qlfssn93.dta", clear gen seasqrtr = 1 } else if `X'==2 { use quota week w1yr qrtr add wavfnd hhld recno indd92m inds92m indd92l inds92l conmpy consey conmon pwt03 using "$rawdatalocate\qlfsd93f.dta", clear gen seasqrtr = 2 } else { display "HELP" } mvdecode conmon conmpy consey, mv(-9 -8) capture confirm variable indmain if ~_rc { mvdecode indmain indlast, mv(312 319 320 321 999) } capture confirm variable indd92m if ~_rc { mvdecode indd92m indd92l, mv(-9 61 62 63) mvdecode inds92m inds92l, mv(-9 18 19 20) } * Create pid, for each quarter. capture drop temp* qui gen temp1 = quota qui gen temp2 = week qui gen temp3 = w1yr qui gen temp4 = qrtr qui gen temp5 = add qui gen temp6 = wavfnd qui gen temp7 = hhld qui gen temp8 = recno capture tostring temp1-temp8 capture tostring temp1-temp8, replace qui replace temp1 = "00" + temp1 if length(temp1)==1 qui replace temp1 = "0" + temp1 if length(temp1)==2 qui replace temp2 = "0" + temp2 if (temp2=="1"|temp2=="2"|temp2=="3"|temp2=="4"|temp2=="5"|temp2=="6"|temp2=="7"|temp2=="8"|temp2=="9") qui replace temp5 = "0" + temp5 if (temp5=="1"|temp5=="2"|temp5=="3"|temp5=="4"|temp5=="5"|temp5=="6"|temp5=="7"|temp5=="8"|temp5=="9") qui replace temp7 = "0" + temp7 if (temp7=="1"|temp7=="2"|temp7=="3"|temp7=="4"|temp7=="5"|temp7=="6"|temp7=="7"|temp7=="8"|temp7=="9") qui replace temp8 = "0" + temp8 if (temp8=="1"|temp8=="2"|temp8=="3"|temp8=="4"|temp8=="5"|temp8=="6"|temp8=="7"|temp8=="8"|temp8=="9") qui gen str20 hid_string = temp1+temp2+temp3+temp4+temp5+temp6+temp7 qui gen str20 pid_string = hid_string+temp8 qui destring hid_string, generate(hid) format hid %13.0f qui destring pid_string, generate(pid) format pid %15.0f label var hid_string "String version of hid = quota+week+w1yr+qrtr+add+wavfnd+hhld" label var hid "Household id = quota+week+w1yr+qrtr+add+wavfnd+hhld" label var pid_string "String version of pid = quota+week+w1yr+qrtr+add+wavfnd+hhld+recno/persno" label var pid "Person id = quota+week+w1yr+qrtr+add+wavfnd+hhld+recno/persno" capture drop temp* qui compress sort pid hid if `X'==1 { save "$locate\qlfs93sn.dta", replace } else if `X'==2 { save "$locate\qlfs93df.dta", replace } } *** CREATE 1-TO-1 MAPPING USING SEASONAL QUARTERLY DATASETS *** * Select only those who remain in the same job between [September-]November 1993 (SIC80) and December[-February] 1993 (SIC92). * Because of the reinterview timetable, those observed in December 1993 (SIC92) were previously interviewed in September 1993 (using SIC80), and those interviewed in October and November 1993 (SIC80) are next interviewed in January and February 1994 (using SIC92). * Use month started current position to identify job stayers. Because a cohort is interviewed every three months, so consecutive SIC observations on the same pid are three months apart, this involves defining stayers as those who started their current position more than three months ago. * Then create mapping based on correspondence between 4-digit SIC80 and 2-digit SIC92 * * conmpy + consey + conmon are used to define job stayers. ! Until part-way through qlfsjm98 ! - conmpy and consey refer to the last 2 digits of the year the individual started working for their current employer. From part-way through qlfsjm98, the data refer to the 4-digit year. * Small versions of the relevant seasonal datasets were created above to include pid and mvdecode industry variables. use "$locate\qlfs93sn.dta", clear append using "$locate\qlfs93df.dta" tab seasqrtr * Non-panel data are no use for defining job stayers and creating the Mapping, so drop those with only one observation over the two quarters. sort pid by pid: egen pidcount = count(pid) tab pidcount qui drop if pidcount~=2 sort pid seasqrtr * Create job stayer indicator. * Define a job stayer in December-February 1993 as a person whose current position started before December 1993 (i.e. November 1993 and before). * In seasonal quarter 1 (sn93), Weeks 1-4 is September. Weeks 5-13 is October - November in seasonal quarter 1 (sn93) (and in the calendar quarter od93). Weeks 1-4 is December in the seccond seasonal quarter d93f (and in the calendar quarter od93). * Interview structure of cohort weeks is: September: 4; October: 4 ; November: 5 ; December: 4; January: 4; February: 5. * Source: lf_tcm77-232553.pdf * "2.4 Single month data extraction * The 13 weeks of the LFS survey quarter comprise a survey calendar of a rolling “4-4-5” weekly repeating pattern (a four-week-long reference month, followed by another four-week-long reference month, followed by a five-weeklong reference month)." (p.6). qui gen datem = tm(1993-9) if seasqrtr==1 & week>=1 & week<=4 qui replace datem = tm(1993-10) if seasqrtr==1 & week >=5 & week<=8 qui replace datem = tm(1993-11) if seasqrtr==1 & week >=9 & week<=13 qui replace datem = tm(1993-12) if seasqrtr==2 & week >=1 & week<=4 qui replace datem = tm(1994-1) if seasqrtr==2 & week >=5 & week<=8 qui replace datem = tm(1994-2) if seasqrtr==2 & week >=9 & week<=13 qui format datem %tm * conmon, conmpy, consey have already been mvdecoded. They have not been cleaned of errors, but this is not needed prior to creation of the job stayer indicator. * Create STATA month-year variable version of conmon and conmpy or consey: month and year started working for current employer or month and year started current spell of self-employment. * Need to include cases where month of starting is unobserved: there are many of these, including all in earlier years, and many could be stayers. However, omit cases where year of starting is unobserved. * Impute missing months as July in previous years and January in current year. qui replace conmpy = conmpy + 1900 if conmpy~=. qui replace consey = consey + 1900 if consey~=. qui replace conmon = 7 if (conmon==. & ((consey~=. & consey<1993) | (conmpy~=. & conmpy<1993)) & year(dofm(datem))==1993) | (conmon==. & ((consey~=. & consey<1994) | (conmpy~=. & conmpy<1994)) & year(dofm(datem))==1994) qui replace conmon = 1 if (conmon==. & ((consey~=. & consey==1993) | (conmpy~=. & conmpy==1993)) & year(dofm(datem))==1993) | (conmon==. & ((consey~=. & consey==1994) | (conmpy~=. & conmpy==1994)) & year(dofm(datem))==1994) qui gen conmpy_yearmonth = ym(conmpy, conmon) if conmpy~=. & conmon~=. label var conmpy_yearmonth "Month and year started working for current employer (STATA format)" qui format conmpy_yearmonth %tm qui gen consey_yearmonth = ym(consey, conmon) if consey~=. & conmon~=. label var consey_yearmonth "Month and year started current spell of self-employment (STATA format)" qui format consey_yearmonth %tm * Generate general date (year, month) of starting current position. qui gen startmonth = conmpy_yearmonth qui replace startmonth = consey_yearmonth if consey_yearmonth~=. & startmonth==. qui format startmonth %tm qui label var startmonth "Month and year of starting current position (current employer or current self-emp spell) (STATA format)" * Generate stayer indicator. Stayer is defined as someone who started current position prior to previous interview (coded as more than 3 months ago). A stayer is only useful in terms of Mapping if datem is December 1993 - February 1994. qui gen stayer = 1 if startmonth<(datem - 3) & startmonth~=. & datem~=. qui replace stayer = 0 if startmonth>=(datem - 3) & startmonth~=. & datem~=. qui replace stayer = . if datem