Abstract—Protecting the environment while sustaining
economic growth is a tough task for every country in the world,
especially for China. China has required major cities to
publicise their Air Pollution Index since 2000 (changed to Air
Quality Index in 2012). Since then, the AQI has become one of
the critical indicators for the central government to assess the
local governments' performance. Comparing official AQI data
from the US Embassy and 35 Beijing air quality monitoring
stations, result reveals a significant manipulation of AQI data
(to just below the Blue Sky threshold of 100). This research aims
to find a way to predict the true AQI values through search
entries in Baidu – the largest search engine in China. This
would remove the need to rely on the data reported by the air
quality monitoring stations, which seems to be unreliable. 73
search entries relating to air pollution and haze were collected
from Baidu to run a LASSO (least absolute shrinkage and
selection operator) analysis. To justify the LASSO analysis and
find out the shrinkage factor, cross-validation method was used.
After the LASSO analysis and cross-validation process, 33
predictors remained to predict AQI from search entries with R2
0.69. These results indicate that search entries can be an
alternative way to predict AQI with 69% prediction accuracy.
In addition, due to limited time, there are only 73 search entries
included in the dataset. For future research, a much higher
prediction accuracy would be expected if more than 500 search
entries included.
Index Terms—Air quality index prediction, search entries,
justification of air quality index, lasso, cross-validation.
Fengyuan Pan is with the University College London, UK. (Email:
fengyuan.pan.15@ucl.ac.uk).
[PDF]
Cite: Fengyuan Pan, "Estimation of Beijing Air Quality Index Using Baidu
Search Entries," International Journal of Social Science and Humanity vol. 8, no. 7, pp. 220-224, 2018.