Before starting this exercise, you should have the “Digital.dta” file. (Download link is given below)
Click here to download – “Digital.dta”
Importing the “Digital.dta” file.
set maxvar 10000
use "/Users/pankajchowdhury/Downloads/Digital.dta"
I have the file located at "/Users/pankajchowdhury/Downloads/Digital.dta"
. Feel free to modify the file path according to your needs.
Observations: | 724115 | |||
Variables: | 31 | |||
Variable name | Storage type | Display format | Value label | Variable label |
v005 | long | %12.0g | v005 | women's individual sample weight (6 decimals) |
v013 | byte | %8.0g | V013 | age in 5-year groups |
v024 | byte | %8.0g | V024 | state |
v025 | byte | %8.0g | V025 | type of place of residence |
v130 | byte | %8.0g | V130 | religion |
v133 | byte | %8.0g | V133 | education in single years |
v151 | byte | %8.0g | V151 | sex of household head |
v157 | byte | %8.0g | V157 | frequency of reading newspaper or magazine |
v158 | byte | %8.0g | V158 | frequency of listening to radio |
v159 | byte | %8.0g | V159 | frequency of watching television |
v169a | byte | %8.0g | V169A | owns a mobile telephone |
v190 | byte | %8.0g | V190 | wealth index combined |
v217 | byte | %8.0g | V217 | knowledge of ovulatory cycle |
v504 | byte | %8.0g | V504 | currently residing with husband/partner |
v702 | byte | %8.0g | V702 | husband/partner's highest year of education (at level in v701) |
v704 | byte | %8.0g | V704 | husband/partner's occupation |
v715 | byte | %8.0g | V715 | husband/partner's total number of years of education |
v730 | byte | %8.0g | V730 | husband/partner's age |
v743f | byte | %8.0g | V743F | person who usually decides what to do with money husband earns |
v746 | byte | %8.0g | V746 | respondent earns more than husband/partner |
d005 | long | %12.0g | d005 | weight for domestic violence (6 decimals) |
d102 | byte | %8.0g | d012 | number of control issues answered 'yes' (d101x = 1) |
sweight | long | %12.0g | sweight | sample weight (6 decimals) (state level) |
s116 | byte | %8.0g | S116 | belong to a scheduled caste, a scheduled tribe, other backward class |
s303 | int | %8.0g | S303 | time period not living with husband |
s311 | byte | %8.0g | S311 | type of relationship to current husband,prior to marriage |
s931 | byte | %8.0g | S931 | do you have a bank or savings account that you yourself use |
s932 | byte | %8.0g | S932 | do you have any mobile phone that you yourself use |
s933 | byte | %8.0g | S933 | do you use your mobile phone for any financial transaction ? |
s934 | byte | %8.0g | S934 | have you ever used the internet? |
s1004p | byte | %8.0g | S1004P | source of information about aids: internet |
Code 1:
numlabel,add
Explanation :
In Stata, the numlabel
the command is used to assign value labels to numeric variables. It allows you to assign meaningful labels to numeric values, which can be helpful for data interpretation and analysis. The add
option, in particular, is used with the numlabel
command to add value labels to a variable.
Now you can view the value labels by examining the tabulation of the variable. This allows you to see the meaningful labels associated with each value in the variable.
Code 2:
ta v190
Result :
wealth index combined | Freq. | Percent | Cum. |
---|---|---|---|
1. poorest | 1,49,844 | 20.69 | 20.69 |
2. poorer | 1,60,340 | 22.14 | 42.84 |
3. middle | 1,51,505 | 20.92 | 63.76 |
4. richer | 1,39,607 | 19.28 | 83.04 |
5. richest | 1,22,819 | 16.96 | 100 |
Total | 7,24,115 | 100 |
Explanation : ta v190
– this command will provide a frequency table comprising cumulative frequency and percentage distribution for variable v190.
Now, I will present various recoding techniques to you, demonstrating different ways to transform and manipulate variables in Stata.
Code 3:
recode v190 (1 2 =1 "Poor") (3=2 "Middle")( 4/5=3 "Rich"), gen (Income) label(Income)
Explanation :
The command you provided, recode v190 (1 2 =1 "Poor") (3=2 "Middle")(4/5=3 "Rich"), gen(Income) label(Income)
, can be broken down into two parts: recoding and labelling a variable in Stata.
- Recoding:
recode v190
: This specifies the variablev190
that you want to recode.(1 2 = 1 "Poor")
: This recodes the values 1 and 2 inv190
– to a new value of 1 and assigns the label “Poor” to them.(3 = 2 "Middle")
: This recodes the value 3 inv190
– to a new value of 2 and assigns the label “Middle” to it.(4/5 = 3 "Rich")
: This recodes the values 4 and 5 inv190
– to a new value of 3 and assigns the label “Rich” to them.- The recoding process creates a new variable.
- Labelling:
gen(Income)
: This part generates a new variable named “Income” that contains the recoded values.label(Income)
: This assigns the label “Income” to the newly generated variable “Income”.
To summarize, the command recode v190 (1 2 =1 "Poor") (3=2 "Middle")(4/5=3 "Rich"), gen(Income) label(Income)
recodes the values of the variable v190
into new values, creates a new variable named “Income” with the recoded values, and assigns the label “Income” to that variable.
Result:
RECODE of v190 (wealth index combined) | Freq. | Percent | Cum. |
---|---|---|---|
Poor | 3,10,184 | 42.84 | 42.84 |
Middle | 1,51,505 | 20.92 | 63.76 |
Rich | 2,62,426 | 36.24 | 100 |
Total | 7,24,115 | 100 |
Code 4:
ta v024
Result :
state | Freq. | Percent | Cum. |
---|---|---|---|
1. jammu & kashmir | 23,037 | 3.18 | 3.18 |
2. himachal pradesh | 10,368 | 1.43 | 4.61 |
3. punjab | 21,771 | 3.01 | 7.62 |
4. chandigarh | 746 | 0.1 | 7.72 |
5. uttarakhand | 13,280 | 1.83 | 9.56 |
6. haryana | 21,909 | 3.03 | 12.58 |
7. nct of delhi | 11,159 | 1.54 | 14.12 |
8. rajasthan | 42,990 | 5.94 | 20.06 |
9. uttar pradesh | 93,124 | 12.86 | 32.92 |
10. bihar | 42,483 | 5.87 | 38.79 |
11. sikkim | 3,271 | 0.45 | 39.24 |
12. arunachal pradesh | 19,765 | 2.73 | 41.97 |
13. nagaland | 9,694 | 1.34 | 43.31 |
14. manipur | 8,042 | 1.11 | 44.42 |
15. mizoram | 7,279 | 1.01 | 45.42 |
16. tripura | 7,314 | 1.01 | 46.43 |
17. meghalaya | 13,089 | 1.81 | 48.24 |
18. assam | 34,979 | 4.83 | 53.07 |
19. west bengal | 21,408 | 2.96 | 56.03 |
20. jharkhand | 26,495 | 3.66 | 59.69 |
21. odisha | 27,971 | 3.86 | 63.55 |
22. chhattisgarh | 28,468 | 3.93 | 67.48 |
23. madhya pradesh | 48,410 | 6.69 | 74.17 |
24. gujarat | 33,343 | 4.6 | 78.77 |
25. dadra & nagar haveli and daman & diu | 2,713 | 0.37 | 79.15 |
27. maharashtra | 33,755 | 4.66 | 83.81 |
28. andhra pradesh | 10,975 | 1.52 | 85.32 |
29. karnataka | 30,455 | 4.21 | 89.53 |
30. goa | 2,030 | 0.28 | 89.81 |
31. lakshadweep | 1,234 | 0.17 | 89.98 |
32. kerala | 10,969 | 1.51 | 91.49 |
33. tamil nadu | 25,650 | 3.54 | 95.04 |
34. puducherry | 3,669 | 0.51 | 95.54 |
35. andaman & nicobar islands | 2,397 | 0.33 | 95.87 |
36. telangana | 27,518 | 3.8 | 99.67 |
37. ladakh | 2,355 | 0.33 | 100 |
Total | 7,24,115 | 100 |
recode v024 (1 2 3 4 6 7 8 37=1 "Northern Region") (5 9 23 22=2 "Central Region")( 10 19 20 21 11=3 "Eastern Region")( 12/18=4 "North Eastern Region")( 24 25 27=5 "Western Region")(24 25 27=5 "Western Region")(28/36=6 "Southern Region"), gen (Region1) label(Region1)
ta Religion1
Explanation :
The command you provided, recode v024 (1 2 3 4 6 7 8 37=1 "Northern Region") (5 9 23 22=2 "Central Region")(10 19 20 21 11=3 "Eastern Region")(12/18=4 "North Eastern Region")(24 25 27=5 "Western Region")(24 25 27=5 "Western Region")(28/36=6 "Southern Region"), gen(Region1) label(Region1)
, involves recoding a variable in Stata, generating a new variable, and assigning labels to it.
Here’s a breakdown of the command:
- Recoding:
v024
: This specifies the variablev024
that you want to recode.(1 2 3 4 6 7 8 37=1 "Northern Region")
: This recodes the values 1, 2, 3, 4, 6, 7, 8, and 37 inv024
to a new value of 1 and assigns the label “Northern Region” to them.(5 9 23 22=2 "Central Region")
: This recodes the values 5, 9, 23, and 22 inv024
to a new value of 2 and assigns the label “Central Region” to them.(10 19 20 21 11=3 "Eastern Region")
: This recodes the values 10, 19, 20, 21, and 11 inv024
to a new value of 3 and assigns the label “Eastern Region” to them.(12/18=4 "North Eastern Region")
: This recodes the values 12 to 18 inv024
to a new value of 4 and assigns the label “North Eastern Region” to them.(24 25 27=5 "Western Region")
: This recodes the values 24, 25, and 27 inv024
to a new value of 5 and assigns the label “Western Region” to them.(28/36=6 "Southern Region")
: This recodes the values 28 to 36 inv024
to a new value of 6 and assigns the label “Southern Region” to them.
- Generating and labeling:
gen(Region1)
: This part generates a new variable named “Region1” that contains the recoded values.label(Region1)
: This assigns the label “Region1” to the newly generated variable “Region1”.
Result :
RECODE of v024 (state) | Freq. | Percent | Cum. |
---|---|---|---|
Northern Region | 1,34,335 | 18.55 | 18.55 |
Central Region | 1,83,282 | 25.31 | 43.86 |
Eastern Region | 1,21,628 | 16.8 | 60.66 |
North Eastern Region | 1,00,162 | 13.83 | 74.49 |
Western Region | 69,811 | 9.64 | 84.13 |
Southern Region | 1,14,897 | 15.87 | 100 |
Total | 7,24,115 | 100 |
Code 5:
recode v024 (min/8 37=1 "Northern Region") (5 9 23 22=2 "Central Region")( 10 19 20 21 11=3 "Eastern Region")( 12/18=4 "North Eastern Region")( 24 25 27=5 "Western Region")(24 25 27=5 "Western Region")(28/max=6 "Southern Region"), gen (Region2) label(Region2)
Explanation :
The command recode v024 (min/8 37=1 "Northern Region") (5 9 23 22=2 "Central Region")(10 19 20 21 11=3 "Eastern Region")(12/18=4 "North Eastern Region")(24 25 27=5 "Western Region")(24 25 27=5 "Western Region")(28/max=6 "Southern Region"), gen(Region2) label(Region2)
performs recoding and labeling operations in Stata. Let’s break down the command:
- Recoding:
v024
: This specifies the variablev024
that you want to recode.(min/8 37=1 "Northern Region")
: This recodes the values from the minimum value ofv024
up to 8, and the value 37, to a new value of 1. It assigns the label “Northern Region” to these recoded values.(5 9 23 22=2 "Central Region")
: This recodes the values 5, 9, 23, and 22 inv024
to a new value of 2. It assigns the label “Central Region” to these recoded values.(10 19 20 21 11=3 "Eastern Region")
: This recodes the values 10, 19, 20, 21, and 11 inv024
to a new value of 3. It assigns the label “Eastern Region” to these recoded values.(12/18=4 "North Eastern Region")
: This recodes the values from 12 to 18 inv024
to a new value of 4. It assigns the label “North Eastern Region” to these recoded values.(24 25 27=5 "Western Region")
: This recodes the values 24, 25, and 27 inv024
to a new value of 5. It assigns the label “Western Region” to these recoded values.(28/max=6 "Southern Region")
: This recodes the values from 28 up to the maximum value ofv024
to a new value of 6. It assigns the label “Southern Region” to these recoded values.- The recoding process creates a new variable.
- Labeling:
gen(Region2)
: This generates a new variable named “Region2” that contains the recoded values.label(Region2)
: This assigns the label “Region2” to the newly generated variable “Region2”.
Results :
RECODE of v024 (state) | Freq. | Percent | Cum. |
---|---|---|---|
Northern Region | 1,34,335 | 18.55 | 18.55 |
Central Region | 1,83,282 | 25.31 | 43.86 |
Eastern Region | 1,21,628 | 16.8 | 60.66 |
North Eastern Region | 1,00,162 | 13.83 | 74.49 |
Western Region | 69,811 | 9.64 | 84.13 |
Southern Region | 1,14,897 | 15.87 | 100 |
Total | 7,24,115 | 100 |
Code 6:
recode v024 (1/8 37=1 "Northern Region") (5 9 23 22=2 "Central Region")( 10 19 20 21 11=3 "Eastern Region")( 12/18=4 "North Eastern Region")( 24 25 27=5 "Western Region")(24 25 27=5 "Western Region")(nonmiss=6 "Southern Region"), gen (Region3) label(Region3)
Explanation :
The command you provided, recode v024 (1/8 37=1 "Northern Region") (5 9 23 22=2 "Central Region")(10 19 20 21 11=3 "Eastern Region")(12/18=4 "North Eastern Region")(24 25 27=5 "Western Region")(24 25 27=5 "Western Region")(nonmiss=6 "Southern Region"), gen(Region3) label(Region3)
, involves recoding a variable in Stata using multiple conditions and generating a new variable with value labels assigned.
Let’s break down the command step by step:
- Recoding:
v024
: This specifies the variablev024
that you want to recode.(1/8 37=1 "Northern Region")
: This recodes the values 1 to 8 and 37 inv024
to a new value of 1, and assigns the label “Northern Region” to them.(5 9 23 22=2 "Central Region")
: This recodes the values 5, 9, 23, and 22 inv024
to a new value of 2, and assigns the label “Central Region” to them.(10 19 20 21 11=3 "Eastern Region")
: This recodes the values 10, 19, 20, 21, and 11 inv024
to a new value of 3, and assigns the label “Eastern Region” to them.(12/18=4 "North Eastern Region")
: This recodes the values 12 to 18 inv024
to a new value of 4, and assigns the label “North Eastern Region” to them.(24 25 27=5 "Western Region")
: This recodes the values 24, 25, and 27 inv024
to a new value of 5, and assigns the label “Western Region” to them.(nonmiss=6 "Southern Region")
: This recodes all non-missing values inv024
to a new value of 6, and assigns the label “Southern Region” to them.
- Generating and labeling:
gen(Region3)
: This part generates a new variable named “Region3” that contains the recoded values.label(Region3)
: This assigns the label “Region3” to the newly generated variable “Region3”.
Results :
RECODE of v024 (state) | Freq. | Percent | Cum. |
---|---|---|---|
Northern Region | 1,34,335 | 18.55 | 18.55 |
Central Region | 1,83,282 | 25.31 | 43.86 |
Eastern Region | 1,21,628 | 16.8 | 60.66 |
North Eastern Region | 1,00,162 | 13.83 | 74.49 |
Western Region | 69,811 | 9.64 | 84.13 |
Southern Region | 1,14,897 | 15.87 | 100 |
Total | 7,24,115 | 100 |
Code 7:
ta s116
Results :
belong to a scheduled caste, a scheduled tribe, other backward class | Freq. | Percent | Cum. |
---|---|---|---|
1. schedule caste | 1,39,957 | 20.3 | 20.3 |
2. schedule tribe | 1,35,239 | 19.62 | 39.92 |
3. obc | 2,76,881 | 40.16 | 80.07 |
4. none of them | 1,33,347 | 19.34 | 99.42 |
8. don't know | 4,030 | 0.58 | 100 |
Total | 6,89,454 | 100 |
recode s116 (1=1 "Schedule Caste ") (2=2 "Schedule Tribe ")(3=3 "OBC")(else=4 "Others"), gen(Caste) label(Caste2)
Explanation :
The command recode s116 (1=1 "Schedule Caste ") (2=2 "Schedule Tribe ")(3=3 "OBC")(else=4 "Others"), gen(Caste) label(Caste2)
can be explained as follows:
recode s116
: This specifies the variables116
that you want to recode.(1=1 "Schedule Caste ")
: This recodes the value 1 ins116
to a new value of 1 and assigns the label “Schedule Caste” to it.(2=2 "Schedule Tribe ")
: This recodes the value 2 ins116
to a new value of 2 and assigns the label “Schedule Tribe” to it.(3=3 "OBC")
: This recodes the value 3 ins116
to a new value of 3 and assigns the label “OBC” to it.(else=4 "Others")
: This specifies that any other value ins116
that is not explicitly mentioned in the previous recoding rules should be recoded as 4 and labeled as “Others”.gen(Caste)
: This part generates a new variable named “Caste” that contains the recoded values.label(Caste2)
: This assigns the label “Caste2” to the newly generated variable “Caste”.
Results:
RECODE of s116 (belong to a scheduled caste, a scheduled tribe, other backward class) | Freq. | Percent | Cum. |
---|---|---|---|
Schedule Caste | 1,39,957 | 19.33 | 19.33 |
Schedule Tribe | 1,35,239 | 18.68 | 38 |
OBC | 2,76,881 | 38.24 | 76.24 |
Others | 1,72,038 | 23.76 | 100 |
Total | 7,24,115 | 100 |
Code 8:
recode s116 (1=1 "Schedule Caste ") (2=2 "Schedule Tribe ")( 3=3 "OBC")( 4 8 .=4 "Others"), gen (Caste) label(Caste1)
Explanation :
The command you provided, recode s116 (1=1 "Schedule Caste") (2=2 "Schedule Tribe")(3=3 "OBC")(4 8 .=4 "Others"), gen(Caste) label(Caste1)
, can be explained as follows:
- Recoding:
recode s116
: This specifies the variables116
that you want to recode.(1=1 "Schedule Caste")
: This recodes the value 1 ins116
to a new value of 1 and assigns the label “Schedule Caste” to it.(2=2 "Schedule Tribe")
: This recodes the value 2 ins116
to a new value of 2 and assigns the label “Schedule Tribe” to it.(3=3 "OBC")
: This recodes the value 3 ins116
to a new value of 3 and assigns the label “OBC” to it.(4 8 .=4 "Others")
: This recodes the values 4 and 8 ins116
to a new value of 4 and assigns the label “Others” to them. The.
(period) represents missing values, so any missing value ins116
will also be recoded to 4 with the label “Others”.- The recoding process creates a new variable.
- Labeling:
gen(Caste)
: This part generates a new variable named “Caste” that contains the recoded values.label(Caste1)
: This assigns the label “Caste1” to the newly generated variable “Caste”.
Results:
RECODE of s116 (belong to a scheduled caste, a scheduled tribe, other backward class) | Freq. | Percent | Cum. |
---|---|---|---|
Schedule Caste | 1,39,957 | 19.33 | 19.33 |
Schedule Tribe | 1,35,239 | 18.68 | 38 |
OBC | 2,76,881 | 38.24 | 76.24 |
Others | 1,72,038 | 23.76 | 100 |
Total | 7,24,115 | 100 |