Skip to content

Commit bcf1aaa

Browse files
committed
Merge colloc_graph into main
2 parents 3352bce + 863e771 commit bcf1aaa

File tree

11 files changed

+78558
-71
lines changed

11 files changed

+78558
-71
lines changed

README.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -203,13 +203,24 @@ tests = comparison.statistical_tests
203203

204204
### Collocation Networks
205205

206-
Build and export word association networks:
206+
Build and export word association networks with richer metadata:
207207

208208
```julia
209209
network = colloc_graph(
210-
corpus, ["climate", "change"],
211-
metric=PMI, depth=2, min_score=3.0
210+
corpus, ["climate", "change"]; # seed terms
211+
metric=PMI,
212+
depth=2,
213+
min_score=2.5,
214+
direction=:undirected,
215+
include_frequency=true,
216+
weight_normalization=:minmax,
217+
compute_centrality=true,
218+
centrality_metrics=[:pagerank, :betweenness]
212219
)
220+
221+
first(network.edges, 5) # includes Frequency / DocFrequency / NormalizedWeight
222+
first(network.node_metrics, 5) # includes degrees, strengths & centrality scores
223+
213224
gephi_graph(network, "nodes.csv", "edges.csv")
214225
```
215226

data_austen/Emma.txt

Lines changed: 16870 additions & 0 deletions
Large diffs are not rendered by default.

data_austen/Mansfield_Park.txt

Lines changed: 16048 additions & 0 deletions
Large diffs are not rendered by default.

data_austen/Northanger_Abbey.txt

Lines changed: 8373 additions & 0 deletions
Large diffs are not rendered by default.

data_austen/Persuasion.txt

Lines changed: 8736 additions & 0 deletions
Large diffs are not rendered by default.

data_austen/Pride_and_Prejudice.txt

Lines changed: 14911 additions & 0 deletions
Large diffs are not rendered by default.

data_austen/Sense_and_Sensibility.txt

Lines changed: 13053 additions & 0 deletions
Large diffs are not rendered by default.

docs/src/guide/corpus_analysis.md

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -297,18 +297,25 @@ network = colloc_graph(
297297
min_score=-10.0,
298298
max_neighbors=5,
299299
windowsize=5,
300-
minfreq=1
300+
minfreq=1,
301+
direction=:undirected,
302+
include_frequency=true,
303+
weight_normalization=:zscore,
304+
compute_centrality=true
301305
)
302306
303307
println("\nCollocation Network:")
304308
println(" Nodes: $(length(network.nodes))")
305309
println(" Edges: $(nrow(network.edges))")
306310
307-
if !isempty(network.edges)
308-
println("\nStrongest connections:")
309-
for row in eachrow(first(sort(network.edges, :Weight, rev=true), 5))
310-
println(" $(row.Source) → $(row.Target): $(round(row.Weight, digits=2))")
311-
end
311+
println("\nTop weighted connections:")
312+
for row in eachrow(first(sort(network.edges, :Weight, rev=true), 5))
313+
println(" $(row.Source) ↔ $(row.Target): score=$(round(row.Weight, digits=2)), freq=$(row.Frequency)")
314+
end
315+
316+
println("\nCentrality (Pagerank):")
317+
for row in eachrow(first(sort(network.node_metrics, :Centrality_pagerank, rev=true), 5))
318+
println(" $(row.Node): pagerank=$(round(row.Centrality_pagerank, digits=4))")
312319
end
313320
```
314321

docs/src/index.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -220,12 +220,18 @@ network = colloc_graph(
220220
corpus, ["innovation"],
221221
metric=LLR,
222222
depth=1,
223-
minfreq=1
223+
minfreq=1,
224+
include_frequency=true,
225+
weight_normalization=:rank,
226+
compute_centrality=true
224227
)
225228
226-
network.node_metrics
229+
first(network.edges, 5)
227230
```
228231

232+
The returned `node_metrics` table now includes degree/strength totals and optional
233+
centrality scores, providing a quick overview of the structural role of each term.
234+
229235
### Comparative Analysis
230236

231237
Compare associations across subcorpora:

0 commit comments

Comments
 (0)