20newsW100

Tags:

Undirected

Description

Nodes represent news articles. Each hyperedge represents a word and contains the set of articles in which that word occurs. A dataset from the UCI Categorical Machine Learning Repository. The dataset contains 16,242 articles with binary occurrence values of 100 words. Each word is regarded as a hyperedge and the news articles are vertices. Dataset construction following Yang et al. (Chaoqi Yang, Ruijie Wang, Shuochao Yao, and Tarek Abdelzaher. Hypergraph learning with line expansion. arXiv preprint arXiv:2005.04843, 2020.).

Basic statistics

  • Nodes: 16242
  • Hyperedges: 100
  • Unique hyperedges: 100
  • Max size hyperedge: 2241

Hyperedge size distribution

Hyperdegree distribution

Related datasets

Citation

When this data is used in published research or for visualization purposes, please cite the following:

                    
                    Copied!
                    @inproceedings{chien2022you,
    title={You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks},
    author={Eli Chien and Chao Pan and Jianhao Peng and Olgica Milenkovic},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=hpBTIv2uy_E}
}