Erstellen Sie ein Phrasennetz mit R.

Kennt jemand ein R-Paket oder hat er eine Möglichkeit, solche Phrasennetze zu erstellen? Geben Sie hier die Bildbeschreibung ein

r data-visualization text-mining

— Tyler Rinker
quelle

Ich hoffe das macht Sinn. Ich habe es irgendwie zusammengeschmissen, aber es scheint, dass es das ist, was du tun willst. Ich habe mir einen Test über den obigen Kopfgeld-Hyperlink geholt. Es zeigt die Wörter, die nach einem bestimmten Wort kommen, sowie das Verhältnis der Zeiten, zu denen diese Ergebnisse aufgetreten sind. Dies wird nichts für die Visualisierung tun, obwohl ich sicher bin, dass es nicht unmöglich wäre, sie zu erstellen. Es sollte den größten Teil der Hintergrundmathematik erledigen.

library(tau)

#this will load the string
x <- tokenize("Questions must be at least 2 days old to be eligible for a bounty. There can only be 1 active bounty per question at any given time. Users must have at least 75 reputation to offer a bounty, and may only have a maximum of 3 active bounties at any given time. The bounty period lasts 7 days. Bounties must have a minimum duration of at least 1 day. After the bounty ends, there is a grace period of 24 hours to manually award the bounty. If you do not award your bounty within 7 days (plus the grace period), the highest voted answer created after the bounty started with at least 2 upvotes will be awarded half the bounty amount. If there's no answer meeting that criteria, the bounty is not awarded to anyone. If the bounty was started by the question owner, and the question owner accepts an answer during the bounty period, and the bounty expires without an explicit award – we assume the bounty owner liked the answer they accepted and award it the full bounty amount at the time of bounty expiration. In any case, you will always give up the amount of reputation specified in the bounty, so if you start a bounty, be sure to follow up and award your bounty to the best answer! As an additional bonus, bounty awards are immune to the daily reputation cap and community wiki mode.")

#the number of tokens in the string
n <- length(x)

list <- NULL

count <- 1

#this will remove spaces, list is new string with no spaces
for (i in 1:n) {
  if (x[i] != " ") {
    list[count] <- x[i]
    count <- count + 1
  }
}

#the unique words in the string
y <- unique(list)

#number of tokens in the string
n <- length(list)
#number of distinct tokens
m <- length(y)


#assign tokens to values
ind <- NULL
val <- NULL
#make vector of numbers in place of tokens
for (i in 1:m) {
  ind[i] <- i
  for (j in 1:n) {
    if (y[i] == list[j]) {
      val[j] = i
    } 
  }
}


d <- array(0, c(m, m))

#this finds the number of count of the word after the current word
for (i in 1:(n-1)) {
   d[val[i], val[i+1]] <- d[val[i], val[i+1]] + 1
}

#pick a word
word <- 4

#show the word
y[word]
#[1] "at"

#the words that follow
y[which(d[word,] > 0)]
#[1] "least" "any"   "the" 

#the prob of words that follow
d[word,which(d[word,]>0)]/sum(d[word,])
#[1] 0.5714286 0.2857143 0.1428571

— darrelkj
quelle

Dies macht einige große Schritte in Richtung einer Handlung, die näher an der oben genannten aussieht. Es ist eigentlich das Zeichnen / Visualisieren davon, mit dem ich zu kämpfen habe. Die Darstellung ähnelt fast einer Wortwolke (Größe = Häufigkeit) und die Pfeile ähneln einem Soziogramm in der Netzwerkanalyse, aber die Pfeile vermitteln Bedeutung, da sie eine stärkere Verbindung darstellen. Ich denke, die Arbeit, die Sie geleistet haben, wird beim Zeichnen der Pfeile hilfreich sein. Ich bin eigentlich nicht so vertraut mit Netzwerkanalyse und -visualisierung, daher brauche ich hier viel Hilfe.

— Tyler Rinker

Fügen Sie dies am Ende hinzu, um ein Diagramm zu erhalten. Es wird jedoch klar sein, dass Sie wahrscheinlich die Wörter mit dem niedrigeren Rang herausfiltern und nur diejenigen mit einer größeren Unterstützung verwenden möchten. dd <- t (d) Bibliothek (Diagramm) Plotmat (dd [1:10, 1:10], box.size = 0,05, name = y [1:10], lwd = 2 * dd [1:10,] )

— darrelkj

@ darrelkj Dies scheint auf 10 Wörter beschränkt zu sein, aber ich denke, mit ein bisschen Arbeit beim Verbinden mit Soziogrammen oder so etwas hätten wir eine ziemlich ausgefeilte Funktion. Ich markiere diese Antwort als richtig. darrelkj nach so viel arbeit solltest du den letzten Schliff geben und es in ein Paket werfen. Wenn Sie es uns wissen lassen. Danke für Ihre Hilfe.

— Tyler Rinker

Es ist nicht auf 10 beschränkt, ich wollte einfach nicht das gesamte Array verwenden. Die hier verwendeten zehn sind ebenfalls schlecht ausgewählt.

— Darrelkj

Ich stehe korrigiert. Ich hatte einen Fehler im Code gemacht, als ich ihn ausprobierte, und bekam daher einen Fehler außerhalb der Grenzen. Du bist ganz richtig.

— Tyler Rinker

Sie können Phrasennetze mit Many Eyes erstellen , was eine Art "offizielle" Heimat dieser Visualisierungstechnik ist. Dort können Sie Ihre Daten hochladen (wahrscheinlich ein Textkörper), "Phrase Net" als Visualisierungstechnik auswählen und das bekommen, wonach Sie suchen.

Tatsächlich stammt Ihre Illustration von der Phrase Net- Seite auf Many Eyes .

— Carlos Accioly
quelle

Ja, das ist mir klar, aber ich hatte gehofft, es aufgrund der Flexibilität in R zu tun. Sie können alle Arten von Parametern ändern, um die Daten besser darzustellen, die Sie mit Many Eyes nicht können.

— Tyler Rinker

Mit dem Paket können Sie igraphein Diagramm erstellen und zeichnen, wobei Sie alle Aspekte steuern können. Die Pakete graphund Rgraphvizarbeiten zusammen, um Diagramme zu definieren und zu zeichnen. Beide Optionen bieten viel Kontrolle. ( graphvizist auch ein eigenständiges Paket, in dem Sie alle Arten von Software verwenden können, um das Diagramm zu generieren und graphvizanzeigen zu lassen.)

Natürlich müssen Sie Ihre Daten in ein Diagramm umwandeln, wie es @darrelkj vorschlägt.

— Wayne
quelle