Importing a large-scale dataset into STATA requires two main steps. The first step is to overcome the limitation on the number of variables you intend to analyze. If your dataset exceeds the default limit of 5000 variables, you need to extend this limit to accommodate your desired variables. After resolving this limitation, you can proceed to load the dataset.
How many variables and observations you can analyse through STATA?
- STATA imposes a maximum limit on the number of variables in a dataset, known as “maxvar.” The default value of maxvar is 5,000 for STATA/MP and STATA/SE, and 2,048 for STATA/BE. For STATA/MP and STATA/SE, this default value can be increased by using the “set maxvar” command. However, the default value is fixed for STATA/BE.
- In most cases, users do not need to be concerned about STATA’s memory management as it operates automatically. However, if you are using the Linux operating system, there is a serious bug that should be noted. Please refer to the “Remarks” section below for more information.
- The maximum number of observations in a dataset is fixed at 1,099,511,627,775 for STATA/MP and 2,147,483,619 for STATA/SE and STATA/BE, regardless of the computer’s size or memory settings. Depending on the available memory on your computer, you may face a lower practical limit. For further guidance, refer to the “obs_advice” command in the STATA documentation.
If your dataset contains 9000 variables, you can adjust the maximum variable limit in STATA to accommodate this larger number of variables. By using the provided code (Code: 1), you can set the maximum number of variables up to 10000. This allows you to work with all the variables in your dataset without any issues. It is important to note that this setup is only necessary if your dataset exceeds the default limit of 5000 variables. If your dataset has less than 5000 variables, there is no need to modify the maximum variable limit in STATA.
Before starting this exercise, you should have “DataSet.dta” file. (Download link is given below)
Click here to download – “DataSet.dta”
Step 1 :
Code 1 :
set maxvar 10000
Explanation :
The set maxvar
command is used to define the maximum number of variables that STATA can accommodate. In this case, setting it to 10000
means that when you upload a dataset into your STATA software, you will be able to access and work with up to 10000 variables from that dataset.
Step 2 :
Let’s explore the process of importing a dataset into STATA. Take a look at the following command, which outlines the steps involved.
Code 2 (Importing “.dta” file):
use "DataSet.dta"
Or
use "/Users/pankajchowdhury/Desktop/NFHS5/DataSet.dta"
I have the file located at “/Users/pankajchowdhury/Desktop/NFHS5/DataSet.dta”. Feel free to modify the file path according to your needs.
Explanation :
In the provided code, DataSet.dta
– Represents the name of the STATA file on your desktop. use
– By using the use
command, we are instructing STATA to access and open the file located at that specific path.
Results :
Observations | : 636,699 | |||
Variables | : 17 | |||
Variable name | Storage type | Display | Value label | Variable label |
hv005 | long | %12.0g | hv005 | household sample weight (6 decimals) |
hv024 | byte | %40.0g | HV024 | state |
hv206 | byte | %8.0g | HV206 | has electricity |
hv207 | byte | %8.0g | HV207 | has radio |
hv209 | byte | %8.0g | HV209 | has refrigerator |
Mobile | byte | %8.0g | HV243A | has mobile telephone |
Computer | byte | %8.0g | HV243E | has a computer |
sh48 | int | %21.0g | SH48 | what is the caste or tribe of the head of the household? |
Internet | byte | %8.0g | SH50N | internet |
HH_Owner | byte | %8.0g | SH61 | does this household own this house or any other house? |
Edu_Years | byte | %12.0g | Edu_Years | RECODE of hv108_01 (education completed in single years) |
Caste | byte | %9.0g | Caste | RECODE of sh49 (is this a scheduled caste,a scheduled tribe, other backward class) |
HH | byte | %9.0g | HH | RECODE of hv219 (sex of head of household) |
Religion | byte | %9.0g | Religion | RECODE of sh47 (what is the religion of the head of the household?) |
AgeGrp | byte | %9.0g | AgeGrp | RECODE of hv220 (age of head of household) |
BPC | byte | %9.0g | BPC | RECODE of BPL_Card (does this household have a bpl card?) |
HH_Type | byte | %10.0g | HH_Type | RECODE of HHType (house type (as defined in nfhs-2 and 3)) |