synapsomorphy > 03-09-2025, 03:22 AM
quimqu > 03-09-2025, 09:12 AM
quimqu > 03-09-2025, 12:03 PM
![[Image: axcJsah.gif]](https://i.imgur.com/axcJsah.gif)
![[Image: 0G5ZKmt.gif]](https://i.imgur.com/0G5ZKmt.gif)
![[Image: FbDNZ0Z.gif]](https://i.imgur.com/FbDNZ0Z.gif)
quimqu > 03-09-2025, 04:21 PM
(03-09-2025, 03:22 AM)synapsomorphy Wrote: You are not allowed to view links. Register or Login to view.This is extremely cool! Is your code available? I'd love to see how you're doing it.
RenegadeHealer > 03-09-2025, 10:04 PM
quimqu > 04-09-2025, 03:36 PM
(03-09-2025, 10:04 PM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.quimqu, have you looked into any of the efforts to restore the original order of the VMS folios?
quimqu > 21-09-2025, 06:53 PM
![[Image: v6tNPxL.png]](https://i.imgur.com/v6tNPxL.png)
[/font][/size]
[/font][/size][/font][/size]
[/font][/size][/font][/size][/font][/size]![[Image: D01thGY.png]](https://i.imgur.com/D01thGY.png)
![[Image: Zs1DuKt.png]](https://i.imgur.com/Zs1DuKt.png)
![[Image: 4wnRjm9.png]](https://i.imgur.com/4wnRjm9.png)
Jorge_Stolfi > 21-09-2025, 08:44 PM
quimqu > 21-09-2025, 09:28 PM
(21-09-2025, 08:44 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Just to clarify: in your analysis you take the parags (only, excluding labels and titles) of each page, do some cleanup, compute the token frequency distribution in those parags for each page, and then compare the distributions of different pages to identify "topics", by various criteria. Is that so?
Jorge_Stolfi > 21-09-2025, 09:45 PM
(21-09-2025, 09:28 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Then I turn each paragraph into a bag-of-words
(21-09-2025, 09:28 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.The model learns two things: a set of word distributions (the topics themselves) and, for each paragraph, a vector of proportions telling how those topics are mixed. You choose K, the number of topics; too small and themes are merged, too large and they fragment.