Multivariate Analysis on Foreign Tourist Visits to India
project
statistics
Authors
Aman Das [BS2206]
Raj Pratap Singh [BS2219]
Shreyansh Mukhopadhyay [BS2147]
Published
May 7, 2023
Introduction
Aim
Our Aim is to do conduct an exploratory data analysis on Number of Foreign Tourist Arrivals in India over a period of 2000-2019 by attempting to obtain suitable predictors and identify causal relations through fitting the data to Multiple Linear Regression Models (MLR).
Variables
Along with the response variable Number of Foreign Tourist Arrivals in India, we have 14 other covariates as our predictor variables.
As we can clearly see almost all of the factors are correlated with each other. Thus, are originally assumption that they must be mutually independent is wrong.
The response variable indfortour is significantly related to all predictor variables which means there are some latent relation within the predictors variables.
We want to test our models in the situation that two predictors are absent - indtourexp and indcrim as these are very important variables, but difficult to accurately measure.
Model 1 is the model involving all predictors. Model 2 is the model with highest adjusted R².
data =read.csv(file.path(".","data","tourist_data_simulation_purpose.csv"), colClasses ="numeric", fileEncoding ='UTF-8-BOM')simulateindfortour =function(n) { r =1000 res =c(data[n,1], data[n,2], NA, NA, NA,NA)#using model1 newdata <-data.frame(data[n,3:16]) indfortour <-predict(model1, newdata) res[3] =mean(indfortour) #predicted value using model1 indtourexpsim <-recdf(data$indtourexp, r) indcrimsim <-recdf(data$indcrim, r)for (i in1:r) { newrow <-c (data[n,3],data[n,4],data[n,5],data[n,6],data[n,7],data[n,8],data[n,9],data[n,10],data[n,11],data[n,12],indtourexpsim[i],indcrimsim[i],data[n,15],data[n,16]) newdata <-rbind(newdata, newrow) } newdata <- newdata[-1,] indfortour <-predict(model1, newdata) res[4] =mean(indfortour)#predicted value using model1 with missing predictors#using model2 newdata <-data.frame(data[n,3:16]) indfortour <-predict(model2, newdata) res[5] =mean(indfortour) #predicted value using model2 indtourexpsim <-recdf(data$indtourexp, r) indcrimsim <-recdf(data$indcrim, r)for (i in1:r) { newrow <-c (data[n,3],data[n,4],data[n,5],data[n,6],data[n,7],data[n,8],data[n,9],data[n,10],data[n,11],data[n,12],indtourexpsim[i],indcrimsim[i],data[n,15],data[n,16]) newdata <-rbind(newdata, newrow) } indfortour <-predict(model2, newdata) res[6] =mean(indfortour)#predicted value using model2 with missing predictorsreturn(res)}r =1000# data point1simtable =simulateindfortour(1)# data point2simtable =rbind(simtable, simulateindfortour(2))# data point3simtable =rbind(simtable, simulateindfortour(7))# data point4simtable =rbind(simtable, simulateindfortour(15))simtable =data.frame(simtable)row.names(simtable) <-NULLsimtable %>%kable("html", col.names =c("Year","Original Value","Model 1 prediction","Model 1 prediction after simulation","Model 2 prediction","Model 2 prediction after simulation" ) ) %>%column_spec(1:6, width ="6cm")
Year
Original Value
Model 1 prediction
Model 1 prediction after simulation
Model 2 prediction
Model 2 prediction after simulation
2000
2638813
2520774
3312408
2498458
3401112
2001
2537282
2550759
3350263
2546721
3506358
2006
4447167
4500597
5012900
4487057
5108487
2014
7679099
7810983
7509445
7802343
7431145
Conclusion
We can see that in our model there are 14 covariates, of which 4 namely, intpop, intfortour, indimport and indpastra are most significant.
There is dependance between the variables.
The R² and adjusted R² are very high meaning our model fits indfortour well. They explain more than 99% of the variation in indfortour.
If the data for indtourexp and indcrim are not available, our model using all the predictors is better than the model with highest adjusted R².
Note that due to the nature of our analysis, we could not consider Tourists from Different Nations separately. Factors like relative GDP per Capita, Proximity between the Nations, and Flight Prices would certainly affect the amount of Tourists arriving from country to country.