Inventory
Document count:
40
Total characters:
236,901
Total tokens:
~38,928
Primary format:
.txt
Vocabulary Profile
Unique terms:
4,610
Type-token ratio:
0.149
Richness:
Moderate
Primary language:
English
Document Characteristics
Length range:
423 - 18,348 chars
Mean length:
5,923 chars
Median length:
4,863 chars
Heterogeneity:
Low
Prior Organization
Categories:
ia-practice, ia-theory
Coverage:
100%
Source:
URL structure
Split:
50/50
Feasibility Assessment
Topic Modeling:
✅ NMF recommended (6-10 topics)
Clustering:
✅ Hierarchical preferred
Taxonomy Validation:
✅ Existing categories ready for validation
Concept Graph:
✅ Good candidate for relationship mapping
Quality Notes
• 1 document has < 500 characters (politics-of-classification-resmini)
• 1 document exceeds 15K characters (may dominate topic models)
7
Sara - C-Level Executive
8
Laura - VP Marketing Nonprofit
9
Ben - Product Manager
16
Sue - Sr. Content Strategist
Loading articles...
Loading relationship graph...
Graph Information
This force-directed graph shows document similarity relationships based on TF-IDF cosine similarity. Nodes represent articles, and edges connect similar articles. Use the threshold slider to adjust which relationships are visible.
Click a node to highlight its connections and see related articles.