Optimization of the Random Forest Method Using Principal Component Analysis to Predict House Prices

A Case Study of House Prices in Malang City

Authors

  • Emha Ahdan Fahmi Elmuna Faculty of Science and Technology, Program Specification for Master Study in Computer Science, Universitas Islam Negeri Maulana Malik Ibrahim, Indonesia
  • Totok Chamidy Faculty of Science and Technology, Program Specification for Master Study in Computer Science (Magister Informatika) Universitas Islam Negeri Maulana Malik Ibrahim, Indonesia
  • Fresy Nugroho 2 Faculty of Science and Technology, Program Specification for Master Study in Computer Science (Magister Informatika) Universitas Islam Negeri Maulana Malik Ibrahim, Indonesia

DOI:

https://doi.org/10.25008/ijadis.v4i2.1290

Keywords:

House Price Prediction, Random Forest Method, Principal Component Analysis, Data Mining, Regression, RMSE

Abstract

Investment is an interesting thing, especially property investment. The developer must also be careful in determining the price of the property. It should be noted that every year, both short-term and long-term, property prices increase and rarely go down. In determining the price, it is often also based on the features of the house such as the concept, location, bedrooms, etc. To predict house prices based on their features, the random forest has a good performance for predicting house prices. However, the random forest method has the disadvantage that if you use too many variables, the training process will take longer and feature selection tends to select features that are not informative. One way to reduce features without removing other features is to use Principal Component Analysis. In this research, the method used is Principal Component Analysis (PCA) and Random Forest. From the results of model training, it can be concluded that the use of model evaluation results using PCA has a smaller error rate and more consistent values, with an average of 0.018. While the results of the evaluation without PCA and using only Random Forest have a higher error value with an average of 0.03125. The training time using the PCA model has a faster time, with an average of 7918 milliseconds, while those using only random forest without PCA have an average time of 8975 milliseconds.

Downloads

Download data is not yet available.

Plum Analytics

   

Dimensions

            

References

A. Nur, R. Ema, H. Taufiq, and W. Firdaus, "Modeling House Price Prediction using Regression Analysis and Particle Swarm Optimization Case Study?: Malang, East Java, Indonesia," International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, 2017, doi: 10.14569/ijacsa.2017.081042. https://doi.org/10.14569/IJACSA.2017.081042

Y. Feng and K. Jones, "Comparing Multilevel Modelling and Artificial Neural Networks in House Price Prediction," 2015.

https://doi.org/10.1109/ICSDM.2015.7298035

C. Garriga, A. Hedlund, Y. Tang, and P. Wang, "Rural-urban migration and house prices in China," Reg Sci Urban Econ, vol. 91, Nov. 2021, doi: 10.1016/j.regsciurbeco.2020.103613. https://doi.org/10.1016/j.regsciurbeco.2020.103613

Y. Kang et al., "Understanding house price appreciation using multi-source big geo-data and machine learning," Land use policy, vol. 111, Dec. 2021, doi: 10.1016/j.landusepol.2020.104919. https://doi.org/10.1016/j.landusepol.2020.104919

L. Breiman, "Random Forests," 2001.

P. Oskar Gislason, J. Atli Benediktsson, and J. R. Sveinsson, "Random Forest Classification of Multisource Remote Sensing and Geographic Data," 2004. [Online]. Available: http://www.r-project.org

P. Jiang, X. Sun, and Z. Lu, "Quantitative Estimation of siRNAs Gene Silencing Capability by Random Forest Regression Model," 2007. https://doi.org/10.1109/ICBBE.2007.62

N. Shahirah, J. ' Afar, J. Mohamad, and S. Ismail, "MACHINE LEARNING FOR PROPERTY PRICE PREDICTION AND PRICE VALUATION: A SYSTEMATIC LITERATURE REVIEW," 2021.

M. Shahhosseini, G. Hu, and H. Pham, "Optimizing Ensemble Weights for Machine Learning Models: A Case Study for Housing Price Prediction," in Springer Proceedings in Business and Economics, Springer Science and Business Media B.V., 2020, pp. 87-97. doi: 10.1007/978-3-030-30967-1_9. https://doi.org/10.1007/978-3-030-30967-1_9

T. Wiradinata, F. Graciella, R. Tanamal, Y. S. Soekamto, T. Ratih, and D. Saputri, "POST-PANDEMIC ANALYSIS OF HOUSE PRICE PREDICTION IN SURABAYA: A MACHINE LEARNING APPROACH," Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, vol. 57, no. 5, pp. 562-573, Oct. 2022, doi: 10.35741/issn.0258-2724.57.5.45. https://doi.org/10.35741/issn.0258-2724.57.5.45

A. B. Adetunji, O. N. Akande, F. A. Ajala, O. Oyewo, Y. F. Akande, and G. Oluwadara, "House Price Prediction using Random Forest Machine Learning Technique," in Procedia Computer Science, Elsevier B.V., 2021, pp. 806-813. doi: 10.1016/j.procs.2022.01.100. https://doi.org/10.1016/j.procs.2022.01.100

N. Sharma, Y. Arora, P. Makkar, V. Sharma, and H. Gupta, "Real Estate Price's Forecasting Through Predictive Modelling," in Lecture Notes in Networks and Systems, Springer Science and Business Media Deutschland GmbH, 2021, pp. 589-597. doi: 10.1007/978-981-15-7106-0_58. https://doi.org/10.1007/978-981-15-7106-0_58

T. T. Nguyen, J. Z. Huang, and T. T. Nguyen, "Unbiased feature selection in learning random forests for high-dimensional data," Scientific World Journal, vol. 2015, 2015, doi: 10.1155/2015/471371. https://doi.org/10.1155/2015/471371

C. Gardner and D. C. T. Lo, "PCA embedded random forest," in Conference Proceedings - IEEE SOUTHEASTCON, Institute of Electrical and Electronics Engineers Inc., Mar. 2021. doi: 10.1109/SoutheastCon45413.2021.9401949. https://doi.org/10.1109/SoutheastCon45413.2021.9401949

D. Festa et al., "Unsupervised detection of InSAR time series patterns based on PCA and K-means clustering," International Journal of Applied Earth Observation and Geoinformation, vol. 118, Apr. 2023, doi: 10.1016/j.jag.2023.103276.

https://doi.org/10.1016/j.jag.2023.103276

S. Lu, Q. Li, H. Yu, and X. Wang, "Damage Evaluation Method of CFRP Structures Based on PCA and Random Forest Algorithm," in Proceedings - 2020 Chinese Automation Congress, CAC 2020, Institute of Electrical and Electronics Engineers Inc., Nov. 2020, pp. 3804-3807. doi: 10.1109/CAC51589.2020.9327009. https://doi.org/10.1109/CAC51589.2020.9327009

Q. Song and Y. Huang, "A Solution for Liquor Recognition Based on PCA-RF and Laser Induced Fluorescence," IEEE Access, vol. 9, pp. 35101-35108, 2021, doi: 10.1109/ACCESS.2021.3049941. https://doi.org/10.1109/ACCESS.2021.3049941

Subhash Waskle, Lokesh Parashar, and Upendra Singh, Intrusion Detection System Using PCA with Random Forest Approach. Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)IEEE Xplore, 2020. https://doi.org/10.1109/ICESC48915.2020.9155656

C. Gardner and D. C. T. Lo, "PCA embedded random forest," in Conference Proceedings - IEEE SOUTHEASTCON, Institute of Electrical and Electronics Engineers Inc., Mar. 2021. doi: 10.1109/SoutheastCon45413.2021.9401949. https://doi.org/10.1109/SoutheastCon45413.2021.9401949

M. ?eh, M. Kilibarda, A. Lisec, and B. Bajat, "Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments," ISPRS Int J Geoinf, vol. 7, no. 5, May 2018, doi: 10.3390/ijgi7050168. https://doi.org/10.3390/ijgi7050168

Institute of Electrical and Electronics Engineers and Hindusthan Institute of Technology, Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)?: 02-04, July 2020.

A. K. Gárate-Escamila, A. Hajjam El Hassani, and E. Andrès, "Classification models for heart disease prediction using feature selection and PCA," Inform Med Unlocked, vol. 19, Jan. 2020, doi: 10.1016/j.imu.2020.100330. https://doi.org/10.1016/j.imu.2020.100330

M. Mrówczy?ska, J. Sztubecki, and A. Greinert, "Compression of results of geodetic displacement measurements using the PCA method and neural networks," Measurement (Lond), vol. 158, Jul. 2020, doi: 10.1016/j.measurement.2020.107693.

https://doi.org/10.1016/j.measurement.2020.107693

Q. Xiong, H. Xiong, Q. Kong, X. Ni, Y. Li, and C. Yuan, "Machine learning-driven seismic failure mode identification of reinforced concrete shear walls based on PCA feature extraction," Structures, vol. 44, pp. 1429-1442, Oct. 2022, doi: 10.1016/J.ISTRUC.2022.08.089.

https://doi.org/10.1016/j.istruc.2022.08.089

D. J. Butts, N. E. Thompson, S. A. Christensen, D. M. Williams, and M. S. Murillo, "Data-driven agent-based model building for animal movement through Exploratory Data Analysis," Ecol Modell, vol. 470, p. 110001, Aug. 2022, doi: 10.1016/J.ECOLMODEL.2022.110001.

https://doi.org/10.1016/j.ecolmodel.2022.110001

R. Indrakumari, T. Poongodi, and S. R. Jena, "Heart Disease Prediction using Exploratory Data Analysis," Procedia Comput Sci, vol. 173, pp. 130-139, Jan. 2020, doi: 10.1016/J.PROCS.2020.06.017.

https://doi.org/10.1016/j.procs.2020.06.017

P. Chakri, S. Pratap, Lakshay, and S. K. Gouda, "An exploratory data analysis approach for analyzing financial accounting data using machine learning," Decision Analytics Journal, vol. 7, p. 100212, Jun. 2023, doi: 10.1016/J.DAJOUR.2023.100212.

https://doi.org/10.1016/j.dajour.2023.100212

D. Chowdhury, S. Hovda, and B. Lund, "Analysis of cuttings concentration experimental data using exploratory data analysis," Geoenergy Science and Engineering, vol. 221, p. 111254, Feb. 2023, doi: 10.1016/J.PETROL.2022.111254.

https://doi.org/10.1016/j.petrol.2022.111254

I. Erjavac, D. Kalafatovic, and G. Mauša, "Coupled encoding methods for antimicrobial peptide prediction: How sensitive is a highly accurate model?," Artificial Intelligence in the Life Sciences, vol. 2, p. 100034, Dec. 2022, doi: 10.1016/J.AILSCI.2022.100034.

https://doi.org/10.1016/j.ailsci.2022.100034

K. Ogunsina, I. Bilionis, and D. DeLaurentis, "Exploratory data analysis for airline disruption management," Machine Learning with Applications, vol. 6, p. 100102, Dec. 2021, doi: 10.1016/J.MLWA.2021.100102.

https://doi.org/10.1016/j.mlwa.2021.100102

H. N. Bhandari, B. Rimal, N. R. Pokhrel, R. Rimal, K. R. Dahal, and R. K. C. Khatri, "Predicting stock market index using LSTM," Machine Learning with Applications, vol. 9, p. 100320, Sep. 2022, doi: 10.1016/J.MLWA.2022.100320.

https://doi.org/10.1016/j.mlwa.2022.100320

Downloads

Published

2023-10-06

How to Cite

Elmuna, E. A. F., Chamidy, T., & Nugroho, F. (2023). Optimization of the Random Forest Method Using Principal Component Analysis to Predict House Prices: A Case Study of House Prices in Malang City. International Journal of Advances in Data and Information Systems, 4(2), 155-166. https://doi.org/10.25008/ijadis.v4i2.1290
Abstract views : 208 times