Features Selection for Entity Resolution in Prostitution on Twitter

Authors

  • Reisa Permatasari Departement of Information System, Institut Teknologi Sepuluh Nopember
  • Nur Aini Rakhmawati Departement of Information System, Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.25008/ijadis.v2i1.1214

Keywords:

entity resolution, online prostitution, regularized logistic regression, twitter

Abstract

Entity resolution is the process of determining whether two references to real-world objects refer to the same or different purposes. This study applies entity resolution on Twitter prostitution dataset based on features with the Regularized Logistic Regression training and determination of Active Learning on Dedupe and based on graphs using Neo4j and Node2Vec. This study found that maximum similarity is 1 when the number of features (personal, location and bio specifications) is complete. The minimum similarity is 0.025662627 when the amount of harmful training data. The most influencing similarity feature is the cellphone number with the lowest starting range from 0.997678459 to 0.999993523.  The parameter - length of walk per source has the effect of achieving the best similarity accuracy reaching 71.4% (prediction 14 and yield 10).

Downloads

Download data is not yet available.

References

Republik Indonesia (2016): UNDANG-UNDANG REPUBLIK INDONESIA NOMOR 19 TAHUN 2016 TENTANG PERUBAHAN ATAS UNDANG-UNDANG NOMOR 11 TAHUN 2008 TENTANG INFORMASI DAN TRANSAKSI ELEKTRONIK, Sekretariat Negara, Jakarta, accessed 13 Maret 2019melalui situs internet: https://web.kominfo.go.id/sites/default/files/users/4761/UU 19 Tahun 2016.pdf.

Amanah, F., dan Pradipha, F. C. (2019): Deretan Bisnis Vanessa Angel Sebelum Terjerat Kasus Prostitusi Online - Tribunnews.com, accessed 11 Maret 2019, from: http://www.tribunnews.com/section/2019/01/07/deretan-bisnis-vanessa-angel-sebelum-terjerat-kasus-prostitusi-online.

Wibowo, K. S., dan Hantoro, J. (2019): Kasus Prostitusi Online, Polisi Jawa Timur Periksa Selebram - Nasional Tempo.co, , accessed 11 Maret 2019, from situs internet: https://nasional.tempo.co/read/1176078/kasus-prostitusi-online-polisi-jawa-timur-periksa-selebram.

Riyadi, E. (2019): Prostitusi Online Melibatkan Anak-anak di Blitar Diungkap , accessed 11 Maret 2019, from situs internet: https://news.detik.com/berita-jawa-timur/d-4458443/prostitusi-online-melibatkan-anak-anak-di-blitar-diungkap?_ga=2.179923300.1929368686.1552154784-1328920578.1552154784.

Rosadi, S. (2019): Polisi Bongkar Prostitusi Online di Tarakan, Kencani Mahasiswi Bayar Rp 1,75 juta | merdeka.com, , accessed 11 Maret 2019, from : https://www.merdeka.com/peristiwa/polisi-bongkar-prostitusi-online-di-tarakan-kencani-mahasiswi-bayar-rp-175-juta.html.

Nagpal, C., Miller, K., Boecking, B., dan Dubrawski, A. (2018): An Entity Resolution Approach to Isolate Instances of Human Trafficking Online, 77–84. https://doi.org/10.18653/v1/w17-4411

Peled, O., Fire, M., Rokach, L., dan Elovici, Y. (2016): Matching entities across online social networks, Neurocomputing. https://doi.org/10.1016/j.neucom.2016.03.089

Goga, O., Loiseau, P., Sommer, R., Teixeira, R., dan Gummadi, K. P. (2015): On the Reliability of Profile Matching Across Large Online Social Networks.

Papadakis, G., Svirsky, J., Gal, A., dan Palpanas, T. (2016): Comparative analysis of approximate blocking techniques for entity resolution, Proceedings of the VLDB Endowment, 9(9), 684–695. https://doi.org/10.14778/2947618.2947624

Köpcke, H., Thor, A., dan Rahm, E. (2010): Evaluation of entity resolution approaches on real-world match problems, Proceedings of the VLDB Endowment, 3(1–2), 484–493. https://doi.org/10.14778/1920841.1920904

Bilgic, M., Licamele, L., Getoor, L., dan Shneiderman, B. (2006): D-dupe: An interactive tool for entity resolution in social networks, IEEE Symposium on Visual Analytics Science and Technology 2006, VAST 2006 - Proceedings, 43–50. https://doi.org/10.1109/VAST.2006.261429

Weller, K., Bruns, A., Burgess, J., Mahrt, M., dan Puschmann, C. (Ed.) (2014): Twitter and Society, Peter Lang US. https://doi.org/10.3726/978-1-4539-1170-9

Gaffney, D. F., dan Puschmann, C. (2014): Data collection on Twitter, pp.55-67 on K. Weller, A. Bruns, J. Burgess, M. Mahrt, dan C. Puschmann, ed., Twitter and Society, Peter Lang US, accessed 13 Maret 2019melalui situs internet: https://www.researchgate.net/publication/276974275_Data_collection_on_Twitter.

Guyon, I., dan Elisseeff, A. (2008): An Introduction to Feature Extraction, 1–25 on Feature Extraction, Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_1

Gregg, F., dan Eder, D. (2015): Dedupe, accessed 8 Juli 2019 https://github.com/dedupeio/dedupe.

Grover, A., dan Leskovec, J. (2016): Node2Vec: Scalable Feature Learning for Networks, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). https://doi.org/10.1145/2939672.2939754

A. Mehta, S. Desai, and A. . Chaturvedi, "Text Detection Based on Faster R-CNN Algorithm with Skip Pooling and Fusion of Hindi Handwritten Characters", Int. J. Adv. Data Inf. Syst., vol. 2, no. 1, pp. 36-44, Jan. 2021. https://doi.org/10.25008/ijadis.v2i1.1197

S. Dutta and Akash Mehta, "Unfolding Sarcasm in Twitter Using C-RNN Approach", Bulletin of Comp. Sci. Electr. Eng., vol. 2, no. 1, Mar. 2021. https://doi.org/10.25008/bcsee.v1i2.1134

Downloads

Published

2021-03-27

How to Cite

Features Selection for Entity Resolution in Prostitution on Twitter (R. Permatasari & N. A. Rakhmawati , Trans.). (2021). International Journal of Advances in Data and Information Systems, 2(1), 53-61. https://doi.org/10.25008/ijadis.v2i1.1214

Similar Articles

1-10 of 16

You may also start an advanced similarity search for this article.