VLDB 2019 Tutorial: Combating Fake News: A Data Management and Mining Perspective

Laks V.S. Lakshmanan1, Michael Simpson1, Saravanan Thirumuruganathan2
1University of British Columbia; 2QCRI, HBKU

Time: Aug 27, 2019, Tuesday, 16:00 -- 17:30
Location: TBA.

Abstract

Fake news is a major threat to global democracy resulting in diminished trust in government, journalism and civil society. The public popularity of social media and social networks has caused a contagion of fake news where conspiracy theories, disinformation and extreme views flourish. Detection and mitigation of fake news is one of the fundamental problems of our times and has attracted widespread attention. While fact checking websites such as snopes, politifact and major companies such as Google, Facebook, and Twitter have taken preliminary steps towards addressing fake news, much more remains to be done. As an interdisciplinary topic, various facets of fake news have been studied by communities as diverse as machine learning, databases, journalism, political science and many more

The objective of this tutorial is two-fold. First, we wish to familiarize the database community with the efforts by other communities on combating fake news. We provide a panoramic view of the state-of-the-art of research on various aspects including detection, propagation, mitigation, and intervention of fake news. Next, we provide a concise and intuitive summary of prior research by the database community and discuss how it could be used to counteract fake news. The tutorial covers research from areas such as data integration, truth discovery and fusion, probabilistic databases, knowledge graphs and crowdsourcing from the lens of fake news. Effective tools for addressing fake news could only be built by leveraging the synergistic relationship between database and other research communities. We hope that our tutorial provides an impetus towards such synthesis of ideas and the creation of new ones.

Slides

The slides can be found here.

Fake News Primer

References

  • Lui Guo and Chris Vargo, “Fake News” and Emerging Online Media Ecosystem: An Integrated Intermedia Agenda-Setting Analysis of the 2016 U.S. Presidential Election. Communications Research, June 2018.
  • Wu, Agrawal, Li, Yang, and Yu. Computational Fact-Checking through Query Perturbations. ACM TODS 2017.

Propagation of Fake News

References

  • David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of influence through a social network. KDD 2003.
  • Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. Prominent features of rumor propagation in online social media. ICDM 2013.
  • Soroush Vosoughi, Deb Roy and Sinan Aral. The spread of true and false news online. Science 2018.
  • Xinyi Zhou, Reza Zafarani. Fake News: A Survey of Research, Detection Methods, and Opportunities. arXiv preprint. 2018.

Detection of Fake News

References (Data Integration, Truth Discovery & Fusion)

  • Jing Gao, Qi Li, Bo Zhao, Wei Fan, and Jiawei Han. Truth discovery and crowdsourcing aggregation: A unified perspective. PVLDB 2015.
  • Yannis Katsis, Yannis Papakonstantinou. View-based data integration. Encyclopedia of Database Systems. 2009.
  • Theodoros Rekatsinas, Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, and Christopher Ré. Slimfast: Guaranteed results for data fusion and source reliability. SIGMOD 2017.
  • Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. From data fusion to knowledge fusion. PVLDB 2014.
  • Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Integrating conflicting data: the role of source dependence. PVLDB 2009.

References (ML-based Detection)

  • Subhabrata Mukherjee and Gerhard Weikum. Leveraging Joint Interactions for Credibility Analysis in News Communities. CIKM 2015.
  • SVN Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. Graph kernels. JMLR 2010.
  • Xinyi Zhou, Reza Zafarani, Kai Shu, and Huan Liu. Fake news: Fundamental theories, detection strategies and challenges. WSDM 2019.
  • Xinyi Zhou, Reza Zafarani. Fake News: A Survey of Research, Detection Methods, and Opportunities. arXiv preprint. 2018.
  • Ke Wu, Song Yang, and Kenny Q. Zhu. False rumors detection on sina weibo by propagation structures." ICDE 2015.

References (Knowledge Graph-based Approaches )

  • Akrami, Farahnaz, et al. "Re-evaluating Embedding-Based Knowledge Graph Completion Methods." Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2018.
  • Bordes, Antoine, et al. "Translating embeddings for modeling multi-relational data." Advances in neural information processing systems. 2013.
  • Chang, Lijun, et al. "Optimal enumeration: Efficient top-k tree matching." Proceedings of the VLDB Endowment 8.5 (2015): 533-544.
  • Cheng, Jiefeng, Xianggang Zeng, and Jeffrey Xu Yu. "Top-k graph pattern matching over large graphs." 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 2013.
  • Ciampaglia, Giovanni Luca, et al. "Computational fact checking from knowledge networks." PloS one 10.6 (2015): e0128193
  • Hamilton, Will, et al. "Embedding logical queries on knowledge graphs." Advances in Neural Information Processing Systems. 2018.
  • Jeh, Glen and Jennifer Widom. SimRank: a measure of structural-context similarity. KDD (2002).
  • Kazemi, Seyed Mehran, and David Poole. "Simple embedding for link prediction in knowledge graphs." Advances in Neural Information Processing Systems. 2018.
  • Lao, Ni, and William W. Cohen. "Relational retrieval using a combination of path-constrained random walks." Machine learning 81.1 (2010): 53-67.
  • Lin, Peng, et al. "Discovering graph patterns for fact checking in knowledge graphs." International Conference on Database Systems for Advanced Applications. Springer, Cham, 2018.
  • Lin, Yankai, et al. "Learning entity and relation embeddings for knowledge graph completion." Twenty-ninth AAAI conference on artificial intelligence. 2015.
  • Lü, Linyuan, Ci-Hang Jin, and Tao Zhou. "Similarity index based on local paths for link prediction of complex networks." Physical Review E 80.4 (2009): 046122.
  • Morales, Camilo, et al. "MateTee: A semantic similarity metric based on translation embeddings for knowledge graphs." International Conference on Web Engineering. Springer, Cham, 2017.
  • B. Shi and T. Weninger. Discriminative predicate path mining for fact checking in knowledge graphs. Knowledge-Based Sys., 104:123–133, 2016.
  • Shi, Baoxu, and Tim Weninger. "ProjE: Embedding projection for knowledge graph completion." Thirty-First AAAI Conference on Artificial Intelligence. 2017.
  • P. Shiralkar, A. Flammini, F. Menczer, and G. L. Ciampaglia. Finding streams in knowledge graphs to support fact checking. In 2017 IEEE ICDM 2017, pp 859–864, 2017.
  • Wang, Zhen, et al. "Knowledge graph embedding by translating on hyperplanes." Twenty-Eighth AAAI conference on artificial intelligence. 2014.
  • Xu, Zhongqi, Cunlai Pu, and Jian Yang. "Link prediction based on path entropy." Physica A: Statistical Mechanics and its Applications 456 (2016): 294-301.
  • Yang, Bishan, et al. "Embedding entities and relations for learning and inference in knowledge bases." arXiv preprint arXiv:1412.6575 (2014).
  • Yang, Shengqi, et al. "Schemaless and structureless graph querying." Proceedings of the VLDB Endowment 7.7 (2014): 565-576.
  • Yang, Shengqi, et al. "Fast top-k search in knowledge graphs." 2016 IEEE 32nd international conference on data engineering (ICDE). IEEE, 2016.

Mitigation and Intervention of Fake News

References

  • Bettencourt, Luís MA, et al. "The power of a good idea: Quantitative modeling of the spread of ideas from epidemiological models." Physica A: Statistical Mechanics and its Applications 364 (2006): 513-536.
  • Bharathi, Shishir, David Kempe, and Mahyar Salek. "Competitive influence maximization in social networks." International workshop on web and internet economics. Springer, Berlin, Heidelberg, 2007.
  • Budak, Ceren, Divyakant Agrawal, and Amr El Abbadi. "Limiting the spread of misinformation in social networks." Proceedings of the 20th international conference on World wide web. ACM, 2011.
  • Kempe, David, Jon Kleinberg, and Éva Tardos. "Maximizing the spread of influence through a social network." Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003.
  • Khalil, Elias Boutros, Bistra Dilkina, and Le Song. "Scalable diffusion-aware optimization of network topology." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.
  • Konstantinou, Loukas, Ana Caraban, and Evangelos Karapanos. "Combating Misinformation Through Nudging.” Co-Inform Project. 2019.
  • Medya, Sourav, Arlei Silva, and Ambuj Singh. "Influence Minimization Under Budget and Matroid Constraints: Extended Version." arXiv preprint arXiv:1901.02156 (2019).
  • Nguyen, Nam P., et al. "Containment of misinformation spread in online social networks." Proceedings of the 4th Annual ACM Web Science Conference. ACM, 2012.
  • Prakash, B. Aditya, et al. "Threshold conditions for arbitrary cascade models on arbitrary networks." Knowledge and information systems 33.3 (2012): 549-575.
  • Prakash, B. Aditya, et al. "Fractional immunization in networks." Proceedings of the 2013 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2013.
  • Tong, Guangmo, et al. "An efficient randomized algorithm for rumor blocking in online social networks." IEEE Transactions on Network Science and Engineering (2017).
  • Tong, Hanghang, et al. "On the vulnerability of large graphs." 2010 IEEE International Conference on Data Mining. IEEE, 2010.
  • Vo, Nguyen, and Kyumin Lee. "The rise of guardians: Fact-checking url recommendation to combat fake news." The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 2018
  • Zhang, Yao, and B. Aditya Prakash. "Dava: Distributing vaccines over networks under prior information." Proceedings of the 2014 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2014.
  • Zhang, Yao, and B. Aditya Prakash. "Data-aware vaccine allocation over large networks." ACM Transactions on Knowledge Discovery from Data (TKDD) 10.2 (2015): 20.
  • Zhao, Laijun, et al. "SIHR rumor spreading model in social networks." Physica A: Statistical Mechanics and its Applications 391.7 (2012): 2444-2453.

Future Opportunities

References

  • Lucas Graves. Understanding the promise and limits of automated fact-checking. Factsheet 2018.
  • Naeemul Hassan, Gensheng Zhang, Fatma Arslan, Josue Caraballo, Damian Jimenez, Siddhant Gawsane, Shohedul Hasan et al. ClaimBuster: the first-ever end-to-end fact-checking system. PVLDB 2017.
  • Rene Speck, Diego Esteves, Jens Lehmann, and Axel-Cyrille Ngonga Ngomo. Defacto-a multilingual fact validation interface. ISWC 2015.

Presenters

Laks V.S. Lakshmanan

Laks V.S. Lakshmanan is a professor in the department of Computer Science at the University of British Columbia. He is a Research Fellow of the BC Advanced Systems Institute and was named ACM Distinguished Scientist in November 2016.His research interests span a wide spectrum of topics in Database Systems and related areas, including: relational and object-oriented databases, advanced data models for novel applications, OLAP and data warehousing, database mining, data integration, semi-structured data and XML, directory-enabled networks, querying the WWW, information and social networks and social media, recommender systems, and personalization.

Michael Simpson Michael Simpson is a Postdoctoral Researcher in the De- partment of Computer Science at the University of British Columbia. He earned his PhD from the University of Vic- toria. His research interests include data mining, social network analysis, and the design of scalable algorithms for graph problems.

 

 

Saravanan (Sara) Thirumuruganathan Saravanan (Sara) Thirumuruganathan is a scientist in the Data Analytics group of QCRI, HBKU. He earned his PhD from University of Texas at Arlington. He is broadly interested in data integration/cleaning and machine learning for data management. Saravanan’s work has been selected among best papers of VLDB 2018/2012 and also received a SIGMOD 2018 Research highlight award.