360Studies

Your Destination for Career Excellence in Bioscience, Statistics, and Data Science

Import large Scale dataset in STATA

import-large-scale-dataset-in-stata

Importing a large-scale dataset into STATA requires two main steps. The first step is to overcome the limitation on the number of variables you intend to analyze. If your dataset exceeds the default limit of 5000 variables, you need to extend this limit to accommodate your desired variables. After resolving this limitation, you can proceed to load the dataset.

How many variables and observations you can analyse through STATA?

  1. STATA imposes a maximum limit on the number of variables in a dataset, known as “maxvar.” The default value of maxvar is 5,000 for STATA/MP and STATA/SE, and 2,048 for STATA/BE. For STATA/MP and STATA/SE, this default value can be increased by using the “set maxvar” command. However, the default value is fixed for STATA/BE.
  2. In most cases, users do not need to be concerned about STATA’s memory management as it operates automatically. However, if you are using the Linux operating system, there is a serious bug that should be noted. Please refer to the “Remarks” section below for more information.
  3. The maximum number of observations in a dataset is fixed at 1,099,511,627,775 for STATA/MP and 2,147,483,619 for STATA/SE and STATA/BE, regardless of the computer’s size or memory settings. Depending on the available memory on your computer, you may face a lower practical limit. For further guidance, refer to the “obs_advice” command in the STATA documentation.

If your dataset contains 9000 variables, you can adjust the maximum variable limit in STATA to accommodate this larger number of variables. By using the provided code (Code: 1), you can set the maximum number of variables up to 10000. This allows you to work with all the variables in your dataset without any issues. It is important to note that this setup is only necessary if your dataset exceeds the default limit of 5000 variables. If your dataset has less than 5000 variables, there is no need to modify the maximum variable limit in STATA.

Before starting this exercise, you should have “DataSet.dta” file. (Download link is given below)

Click here to download“DataSet.dta”

Step 1 :

Code 1 :

set maxvar 10000

Explanation : 

The set maxvar command is used to define the maximum number of variables that STATA can accommodate. In this case, setting it to 10000 means that when you upload a dataset into your STATA software, you will be able to access and work with up to 10000 variables from that dataset.

Step 2 :

Let’s explore the process of importing a dataset into STATA. Take a look at the following command, which outlines the steps involved.

 

Code 2 (Importing “.dta” file):

use "DataSet.dta"

Or

use "/Users/pankajchowdhury/Desktop/NFHS5/DataSet.dta"

I have the file located at “/Users/pankajchowdhury/Desktop/NFHS5/DataSet.dta”. Feel free to modify the file path according to your needs.

Explanation : 

In the provided code, DataSet.dta – Represents the name of the STATA file on your desktop. use – By using the use command, we are instructing STATA to access and open the file located at that specific path.

Results :

Observations: 636,699
Variables: 17
Variable nameStorage typeDisplayValue labelVariable label
hv005long%12.0ghv005household sample weight (6 decimals)
hv024byte%40.0gHV024state
hv206byte%8.0gHV206has electricity
hv207byte%8.0gHV207has radio
hv209byte%8.0gHV209has refrigerator
Mobilebyte%8.0gHV243Ahas mobile telephone
Computerbyte%8.0gHV243Ehas a computer
sh48int%21.0gSH48what is the caste or tribe of the head of the household?
Internetbyte%8.0gSH50Ninternet
HH_Ownerbyte%8.0gSH61does this household own this house or any other house?
Edu_Yearsbyte%12.0gEdu_YearsRECODE of hv108_01 (education completed in single years)
Castebyte%9.0gCasteRECODE of sh49 (is this a scheduled caste,a scheduled tribe, other backward class)
HHbyte%9.0gHHRECODE of hv219 (sex of head of household)
Religionbyte%9.0gReligionRECODE of sh47 (what is the religion of the head of the household?)
AgeGrpbyte%9.0gAgeGrpRECODE of hv220 (age of head of household)
BPCbyte%9.0gBPCRECODE of BPL_Card (does this household have a bpl card?)
HH_Typebyte%10.0gHH_TypeRECODE of HHType (house type (as defined in nfhs-2 and 3))

Looking for latest updates and job news, join us on Facebook, WhatsApp, Telegram and Linkedin

You May Also Like

Scroll to Top