movies-cast-genre-multiplex
Description
Multiplex movie-cast hypergraph built from The Movies Dataset. Nodes are actors, hyperedges are movie casts, and layers are movie genres. Movies with multiple genres are added to every corresponding genre layer.
Basic statistics
- Nodes: 202689
- Hyperedges: 86918
- Unique hyperedges: 86918
- Max size hyperedge: 312
Hypergraph metadata
| Property | Description |
|---|---|
| name | (STRING) Dataset name (e.g., movies-cast-genre-multiplex). |
| type | (STRING) Hypergraph type (e.g., MultiplexHypergraph). |
| version | (STRING) Dataset version (e.g., 1.0.0). |
| weighted | (BOOL) Whether repeated movie casts in the same genre layer are stored as edge weights (e.g., true). |
| node_type | (STRING) Semantic type of nodes (e.g., actor). |
| edge_type | (STRING) Semantic type of hyperedges (e.g., movie_cast). |
| layer_type | (STRING) Semantic type of multiplex layers (e.g., genre). |
Node metadata
| Property | Description |
|---|---|
| tmdb_person_id | (INT) TMDB person identifier for the actor (e.g., 31). |
| name | (STRING) Actor name from credits.csv (e.g., Tom Hanks). |
| gender | (INT) TMDB gender code for the actor; 0=unknown/not specified, 1=female, 2=male (e.g., 2). |
Hyperedge metadata
| Property | Description |
|---|---|
| layer | (STRING) Movie genre layer (e.g., Comedy). |
| movie_ids | (LIST[STRING]) TMDB movie identifiers represented by this cast hyperedge in the layer (e.g., [862]). |
| titles | (LIST[STRING]) Movie titles represented by this cast hyperedge in the layer (e.g., [Toy Story]). |
| original_languages | (LIST[STRING]) Original language codes for the represented movies (e.g., [en]). |
| release_years | (LIST[INT]) Release years for represented movies when available (e.g., [1995]). |
| weight | (INT) Number of movies collapsed into this normalized cast hyperedge in the same genre layer. |
Hyperedge size distribution
Hyperdegree distribution
Download
- Version 1.0.0 Binary (9.9 MB) JSON (6.3 MB)
Provenance
Source: https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset
License: CC0 Public Domain
Data derived from The Movies Dataset on Kaggle using movies_metadata.csv and credits.csv. The Kaggle dataset is listed as CC0: Public Domain; the movie details and credits were collected from the TMDB Open API. This product uses TMDB data but is not endorsed or certified by TMDB.
Reproducibility: Instructions and scripts
Citation
When this data is used in published research or for visualization purposes, please cite the following:
Copied!
No BibTeX entry is currently available. Please refer to the original source: https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset