ta.TwitterAnalysis.eda_analysis

TwitterAnalysis.eda_analysis()[source]

Method to print a summary of the initial exploratory data analysis for any dataset.

Examples

>>> # Create Exploratory Data Analysis files
>>> eda_analysis()

It includes the following metrics:

  • Tweet counts: The number of tweet document in the database, divided by the following categories.

    • Total Original Tweets: The number of tweet documents in the database that are original tweets.

    • Total Replies: The number of tweet documents in the database that are replies to another tweet.

    • Total Retweets: The number of tweet documents in the database that are retweets.

    • Total Tweets: The total number of tweet documents in the database.

  • Tweet counts by language: The number of tweets document for each language used in the tweets.

  • Tweet counts by month: The number of tweets document for each month/year.

  • Tweet counts by file: The number of tweets document imported from each of the json files.

  • User counts: The number of users in the database, divided by the following categories.

    • tweet: Users with at least one document in the database.

    • retweet: Users that were retweeted, but are not part of previous group.

    • quote: Users that were quoted, but are not part of previous groups.

    • reply: Users that were replied to, but are not part of previous groups.

    • mention: Users that were mentioned, but are not part of previous groups.

  • All User Connections Graph: The metrics for the graph created based on the users connecting by retweets, quotes, mentions, and replies.

    • # of Nodes: The total number of nodes in the graph.

    • # of Edges: The total number of edges in the graph.

    • # of Nodes of the largest connected components: The total number of nodes in the largest connected component of the graph.

    • # of Edges of the largest connected components: The total number of edges in the largest connected component of the graph.

    • # of Disconnected Graphs: The number of sub-graphs within the main graph that are not connected to each other.

    • # of Louvain Communities found in the largest connected component: The number of communities found in the largest connected component using the Louvain method.

    • Degree of the top 5 most connected users: List of the top 5 users with the highest degree. Shows the user screen name and respective degrees.

    • Average Node Degree of largest connected graph: The average degree of all nodes that are part of the largest connected component of the graph.

    • Plot of the Louvain community distribution: A barchart showing the node count distribution of the communities found with the Louvain method.

    • Disconnected graphs distribution: A plot of a graph showing the distribution of the disconnected graphs. It shows the total number of nodes and edges for each of the disconnected graphs.

  • Mentions User Connections Graph: The same metrics as the All User Connections graph, but only considering the connections made by mentions.

  • Retweets User Connections Graph: The same metrics as the All User Connections graph, but only considering the connections made by retweets.

  • Replies User Connections Graph: The same metrics as the All User Connections graph, but only considering the connections made by replies.

  • HT Connection Graph: The same metrics as the All User Connections graph, but only considering the connections made by hashtags.