# Clubhouse Network Research Data Export

## Overview

This export contains social network data from 3 clubhouses in the "Designing Belonging" research project studying social networks and engagement metrics in mental health clubhouses.

**Principal Investigator:** Dr. Joy Agner (USC Chan Division of Occupational Science and Occupational Therapy)
**Study Locations:** Waipahu, Koolau, Hale Oluea
**Total Interactions:** 4,923

## Data Collection Methods

- **Observational social network analysis:** Trained observers recorded interactions
- **Directional data:** Source → Target interactions with timestamps
- **Engagement scoring:** Quality and type of interactions measured
- **Spatial context:** Room locations tracked for environmental analysis

## File Formats Included

### 1. Original JSON (`01-original-json/`)
- Complete original research data format
- Includes all metadata, engagement scores, and temporal information
- Best for computational analysis requiring full context

### 2. CSV Format (`02-csv/`)
- **edges.csv:** Source-target interaction pairs with attributes
- **nodes.csv:** Individual participants with roles and locations
- Compatible with Excel, R, Python pandas, and most statistical software

### 3. GraphML Format (`03-graphml/`)
- Standard XML-based format for network analysis
- Compatible with: Cytoscape, NetworkX (Python), igraph (R), yEd
- Preserves node and edge attributes with proper typing

### 4. Pajek Format (`04-pajek/`)
- Legacy format still widely used in social network analysis
- Compatible with: Pajek, UCINET, NetworkX, igraph
- Color-coded by role (Red=Staff, Blue=Member, Green=Guest)

### 5. GEXF Format (`05-gexf/`)
- Gephi Exchange Format - optimized for Gephi visualization software
- Includes temporal and attribute information
- Best for creating publication-quality network visualizations

### 6. Adjacency Matrix (`06-matrix/`)
- Binary matrix format (1=interaction exists, 0=no interaction)
- Compatible with matrix-based analysis in MATLAB, R, Python
- Useful for mathematical network analysis and centrality calculations

## Data Dictionary

### Node Attributes
- **id:** Unique participant identifier (de-identified)
- **role:** Staff, Member, Guest, or Unknown
- **room:** Physical location code (e.g., green_big, blue, smoking)
- **clubhouse:** Waipahu, Koolau, or Hale Oluea

### Edge Attributes
- **source/target:** Participant IDs (source initiates interaction)
- **interaction_type:** Verbal, Non-Verbal
- **timestamp:** ISO 8601 format (e.g., 2023-05-05T09:00:00Z)
- **date:** Date of observation (YYYY-MM-DD)
- **valence:** Positive, Negative, Neutral interaction quality
- **engagement_score:** Numeric engagement rating (when available)
- **room:** Physical location where interaction occurred

## Data Integrity & Ethics

- All data is **de-identified** with no Protected Health Information (PHI)
- Participant IDs are research codes, not linked to personal identifiers
- Data collection approved by institutional review boards
- Research team members (coded AA, BB, CC, etc.) filtered from analysis

## Recommended Analysis Tools

### Network Analysis Software
- **R:** igraph, tidygraph, sna packages
- **Python:** NetworkX, graph-tool, snap
- **Gephi:** Free visualization platform
- **Cytoscape:** Biological/social network analysis
- **UCINET:** Traditional social network analysis

### Statistical Platforms
- **R/RStudio:** Full statistical and network analysis
- **Python:** Pandas + NetworkX for data science workflows
- **SPSS/Stata:** With network analysis extensions

## Citation Information

When using this data, please cite:

```
Agner, J., et al. (2025). Designing Belonging: Social Network Analysis in
Mental Health Clubhouses. University of Southern California.
```

## Contact

For questions about this dataset or research collaboration:
- **Principal Investigator:** Dr. Joy Agner, USC Chan Division
- **Data Processing:** Generated on 11/10/2025

## Technical Notes

- **Coordinate system:** Room-based spatial context preserved
- **Temporal resolution:** Hourly timeslots with specific timestamps
- **Missing data:** Coded as null/empty values, not filtered out
- **Data quality:** All entries validated against original observation sheets
