ta.TwitterAnalysis.build_db_collections

TwitterAnalysis.build_db_collections(inc=100000, bots_ids_list_file=None)[source]

This method is in charge of extracting, cleaning, and loading the data into all the collections in MongoDB.

Parameters
  • inc ((Optional)) – used to determine how many tweets will be processed at a time - (Default=100000). A large number may cause out of memory errors, and a low number may take a long time to run, so the decision of what number to use should be made based on the hardware specification.

  • bots_ids_list_file ((Optional)) – a file that contains a list of user ids that are bots. It creates flags in the MongoDB collection to indentify which tweets and user are in the bots list. - (Default=None)

Examples

Load all data into all collections in MongoDB:

>>> inc = 50000
>>> build_db_collections(inc)