The estimation of the population total in undeveloped and developing countries in the recent past has attracted a lot of interest to many researchers due to the sole purpose of planning resource allocation, personnel training and infrastructure in social, health, transport, communication and education. The comprehensive census survey in many countries are conducted every ten years but the government administration changes in many counties every four to five years due to the limit of government terms as per the constitution and therefore does not coincide with the time of census survey. Further, due to the emerging COVID-19 pandemic challenges that requires ministry of health protocols of social distance, the census survey in which the methods of questionnaire and personal interview are commonly used need to be avoided and therefore there is need to search for a better and reliable estimating models for estimating the population total which is the main focus of the study. The existing and developed methods of exponential and logistic class of population total estimating modes have been considered and compared. The main problem in the logistic models in estimating the population total is the estimation of the highest possible population that can be attained for each of the administrative units. In this study a machine learning logistic regression has been proposed and incorporated to search and estimate the constant using the supervised learning process. The performance of the methods have been compared using the Root Mean Square Error (RMSE) whose values were recorded as 1.062, 1.524, 0.477, 0.819 and 0.286 for the exponential, logistic I, Logistic II, logistic III and machine learning logistic (logistic IV) in which the proposed model performed better with the least square error value of 0.286. The proposed model was then used to project the population total and projected the population total for all regions as 51.00, 55.02, 62.50, 69.10, 74.65 and 79.14 in millions in the years 2024, 2029, 2039, 2049, 2059 and 2069 respectively.
Published in | American Journal of Theoretical and Applied Statistics (Volume 10, Issue 1) |
DOI | 10.11648/j.ajtas.20211001.14 |
Page(s) | 22-31 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2021. Published by Science Publishing Group |
Population Total Estimates, Growth, COVID-19, Logistic Regression and Projection
[1] | Osaki-Tomita, K., Mrkic, S., Mbogoni, M., Tadesse, S., & Demirci, M. (2017). Principles and Recomendations for Population and Housing Censuses. New York: United Nations Publication. |
[2] | Wesley, E., & Peterson, F. (2017). The Role of Population in Economic Growth. SAGE Open, 01 15. |
[3] | Heady, H., & Hodge, A. (2009). The Effect of Population Growth on Economic Growth: A Meta-Regression Analysis of the Micro-Economic Literature. Population and Development Review, 35, 221-248. |
[4] | Hathout, D. (2013). Modeling Population Growth: Exponential and Hyperbolic Modeling. Applied Mathematics, 4, 299-304. |
[5] | Gotelli, N. J. (2001). A Primer of Ecology. Sunderland: Sinauer Associates. |
[6] | Kabareh, L., & Mageto, T. (2018). Estimation of Finite Population Total Using Birth and Death Process. International Journal of Engineering, Science and Mathematics, 7 (3), 33-48. |
[7] | Kabareh, L., Mageto, T., & Mwema, B. (2017). Approximation of Finite Population Totals Using Lagrange Polynomial. Open Journal of Statistics, 7, 689-701. |
[8] | Kabareh, L., & Mageto, T. (2017). Comparison of the Piecewise Polynomial Approximation to the Newton Backward Difference Polynomial Approximation of Finite Population Totals. International Journal of Engineering, Science and Matheamtics, 6 (7), 12-26. |
[9] | Kabareh, L., & Mageto, T. (2017). Estimation of Bounded Population and Carrying Capacity with the Logistic Model. Open Journal of Statistics, 7, 936-943. |
[10] | Kabareh, L., & Mageto, T. (2018). Estimating Bounded Population Total Using Linear Regression in the Presence of Supporting Information. International Journal of Mathematics and Computational Science, 4 (3), 112-117. |
[11] | Kulkami, S., Kulkami, S., & Patil, S. (2014). Analysis of Population Growth of India and Estimation for Future. International Journal of Innovative Research in Science, Engineering and Technology, 3 (9), 15843-15850. |
[12] | Agarwal, B. L. (1991). Basic Statistics. New Delhi: Wiley Eastern Limited. |
[13] | Keyfitz, N., & Caswell, H. (2005). Applied Mathematical Demography. New York: Springer Science Business Media, Inc. |
[14] | Berman, A. K., & Paul, L. J. (2008). Algorithms. New Delhi: Centage Learning India Private Limited. |
[15] | Jhingan, M., Bhatt, B., & Desai, J. (2007). Demography. Delhi: Vrinda Publications (P) LTD. |
[16] | Mwangi, Z. (2019). 2019 Kenya Population and Housing Census Volume III: Distribution of Population by Age, Sex and Administration Units. Nairobi: Kenya National Bureau of Statistics. |
[17] | Secretariat, U. N. (2014). Country Classification. Data Sources, Country Classification and Aggregation Methodology, pp. 1-8. |
[18] | Pagano, M., & Gauvreau, K. (2008). Principles of Biostatistics. New Delhi: Cengage Learning India Private Limited. |
[19] | Kenya Infant Mortality Rate 1960 - 2020. (2020). Retrieved from Macrotrends: https://www.macrotrends.net/countries/KEN/kenya/infant-mortality-rate. |
[20] | Hair, J., Black, W., Babin, B., & Anderson, R. (2014). Multivariate Data Analysis. Harlow: Pearson Education Limited. |
[21] | Cochran, G. W. (1992). Sampling Techniques. New Delhi: Wiley Eastern Limited. |
APA Style
Thomas Mageto. (2021). Estimating Population Total Using Machine Learning Logistic Regression: COVID-19 Pandemic Challenges Perspective. American Journal of Theoretical and Applied Statistics, 10(1), 22-31. https://doi.org/10.11648/j.ajtas.20211001.14
ACS Style
Thomas Mageto. Estimating Population Total Using Machine Learning Logistic Regression: COVID-19 Pandemic Challenges Perspective. Am. J. Theor. Appl. Stat. 2021, 10(1), 22-31. doi: 10.11648/j.ajtas.20211001.14
AMA Style
Thomas Mageto. Estimating Population Total Using Machine Learning Logistic Regression: COVID-19 Pandemic Challenges Perspective. Am J Theor Appl Stat. 2021;10(1):22-31. doi: 10.11648/j.ajtas.20211001.14
@article{10.11648/j.ajtas.20211001.14, author = {Thomas Mageto}, title = {Estimating Population Total Using Machine Learning Logistic Regression: COVID-19 Pandemic Challenges Perspective}, journal = {American Journal of Theoretical and Applied Statistics}, volume = {10}, number = {1}, pages = {22-31}, doi = {10.11648/j.ajtas.20211001.14}, url = {https://doi.org/10.11648/j.ajtas.20211001.14}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20211001.14}, abstract = {The estimation of the population total in undeveloped and developing countries in the recent past has attracted a lot of interest to many researchers due to the sole purpose of planning resource allocation, personnel training and infrastructure in social, health, transport, communication and education. The comprehensive census survey in many countries are conducted every ten years but the government administration changes in many counties every four to five years due to the limit of government terms as per the constitution and therefore does not coincide with the time of census survey. Further, due to the emerging COVID-19 pandemic challenges that requires ministry of health protocols of social distance, the census survey in which the methods of questionnaire and personal interview are commonly used need to be avoided and therefore there is need to search for a better and reliable estimating models for estimating the population total which is the main focus of the study. The existing and developed methods of exponential and logistic class of population total estimating modes have been considered and compared. The main problem in the logistic models in estimating the population total is the estimation of the highest possible population that can be attained for each of the administrative units. In this study a machine learning logistic regression has been proposed and incorporated to search and estimate the constant using the supervised learning process. The performance of the methods have been compared using the Root Mean Square Error (RMSE) whose values were recorded as 1.062, 1.524, 0.477, 0.819 and 0.286 for the exponential, logistic I, Logistic II, logistic III and machine learning logistic (logistic IV) in which the proposed model performed better with the least square error value of 0.286. The proposed model was then used to project the population total and projected the population total for all regions as 51.00, 55.02, 62.50, 69.10, 74.65 and 79.14 in millions in the years 2024, 2029, 2039, 2049, 2059 and 2069 respectively.}, year = {2021} }
TY - JOUR T1 - Estimating Population Total Using Machine Learning Logistic Regression: COVID-19 Pandemic Challenges Perspective AU - Thomas Mageto Y1 - 2021/01/22 PY - 2021 N1 - https://doi.org/10.11648/j.ajtas.20211001.14 DO - 10.11648/j.ajtas.20211001.14 T2 - American Journal of Theoretical and Applied Statistics JF - American Journal of Theoretical and Applied Statistics JO - American Journal of Theoretical and Applied Statistics SP - 22 EP - 31 PB - Science Publishing Group SN - 2326-9006 UR - https://doi.org/10.11648/j.ajtas.20211001.14 AB - The estimation of the population total in undeveloped and developing countries in the recent past has attracted a lot of interest to many researchers due to the sole purpose of planning resource allocation, personnel training and infrastructure in social, health, transport, communication and education. The comprehensive census survey in many countries are conducted every ten years but the government administration changes in many counties every four to five years due to the limit of government terms as per the constitution and therefore does not coincide with the time of census survey. Further, due to the emerging COVID-19 pandemic challenges that requires ministry of health protocols of social distance, the census survey in which the methods of questionnaire and personal interview are commonly used need to be avoided and therefore there is need to search for a better and reliable estimating models for estimating the population total which is the main focus of the study. The existing and developed methods of exponential and logistic class of population total estimating modes have been considered and compared. The main problem in the logistic models in estimating the population total is the estimation of the highest possible population that can be attained for each of the administrative units. In this study a machine learning logistic regression has been proposed and incorporated to search and estimate the constant using the supervised learning process. The performance of the methods have been compared using the Root Mean Square Error (RMSE) whose values were recorded as 1.062, 1.524, 0.477, 0.819 and 0.286 for the exponential, logistic I, Logistic II, logistic III and machine learning logistic (logistic IV) in which the proposed model performed better with the least square error value of 0.286. The proposed model was then used to project the population total and projected the population total for all regions as 51.00, 55.02, 62.50, 69.10, 74.65 and 79.14 in millions in the years 2024, 2029, 2039, 2049, 2059 and 2069 respectively. VL - 10 IS - 1 ER -