ta.TwitterAnalysis.eda_analysis¶
-
TwitterAnalysis.eda_analysis()[source]¶ Method to print a summary of the initial exploratory data analysis for any dataset.
Examples
>>> # Create Exploratory Data Analysis files >>> eda_analysis()
It includes the following metrics:
Tweet counts: The number of tweet document in the database, divided by the following categories.
Total Original Tweets: The number of tweet documents in the database that are original tweets.
Total Replies: The number of tweet documents in the database that are replies to another tweet.
Total Retweets: The number of tweet documents in the database that are retweets.
Total Tweets: The total number of tweet documents in the database.
Tweet counts by language: The number of tweets document for each language used in the tweets.
Tweet counts by month: The number of tweets document for each month/year.
Tweet counts by file: The number of tweets document imported from each of the json files.
User counts: The number of users in the database, divided by the following categories.
tweet: Users with at least one document in the database.
retweet: Users that were retweeted, but are not part of previous group.
quote: Users that were quoted, but are not part of previous groups.
reply: Users that were replied to, but are not part of previous groups.
mention: Users that were mentioned, but are not part of previous groups.
All User Connections Graph: The metrics for the graph created based on the users connecting by retweets, quotes, mentions, and replies.
# of Nodes: The total number of nodes in the graph.
# of Edges: The total number of edges in the graph.
# of Nodes of the largest connected components: The total number of nodes in the largest connected component of the graph.
# of Edges of the largest connected components: The total number of edges in the largest connected component of the graph.
# of Disconnected Graphs: The number of sub-graphs within the main graph that are not connected to each other.
# of Louvain Communities found in the largest connected component: The number of communities found in the largest connected component using the Louvain method.
Degree of the top 5 most connected users: List of the top 5 users with the highest degree. Shows the user screen name and respective degrees.
Average Node Degree of largest connected graph: The average degree of all nodes that are part of the largest connected component of the graph.
Plot of the Louvain community distribution: A barchart showing the node count distribution of the communities found with the Louvain method.
Disconnected graphs distribution: A plot of a graph showing the distribution of the disconnected graphs. It shows the total number of nodes and edges for each of the disconnected graphs.
Mentions User Connections Graph: The same metrics as the All User Connections graph, but only considering the connections made by mentions.
Retweets User Connections Graph: The same metrics as the All User Connections graph, but only considering the connections made by retweets.
Replies User Connections Graph: The same metrics as the All User Connections graph, but only considering the connections made by replies.
HT Connection Graph: The same metrics as the All User Connections graph, but only considering the connections made by hashtags.