script/rest/search_twitter.py
author ymh <ymh.work@gmail.com>
Fri, 21 Sep 2018 12:26:32 +0200
changeset 1447 64fe57aef309
parent 1031 5d301c2ddb89
permissions -rw-r--r--
add streaming code for Serpentine's saturday 22/09 afternoon
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
982
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
     1
import argparse
891
8628c590f608 Remove old script and correct obvious script errors
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 464
diff changeset
     2
import re
917
c47f290a001f correct exit on search script
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 916
diff changeset
     3
import sys
982
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
     4
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
     5
import anyjson
11
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
     6
import twitter
122
4c3a15877f80 clean php and python scripts
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 18
diff changeset
     7
982
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
     8
from iri_tweet import models, processor, utils
1031
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
     9
import urlparse
982
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    10
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    11
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    12
def get_options():
9
bb44692e09ee script apres traitement enmi
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents:
diff changeset
    13
    
982
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    14
    usage = "usage: %(prog)s [options] <connection_str_or_filepath>"
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    15
    
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    16
    parser = argparse.ArgumentParser(usage=usage)
9
bb44692e09ee script apres traitement enmi
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents:
diff changeset
    17
982
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    18
    parser.add_argument(dest="conn_str",
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    19
                        help="write tweet to DATABASE. This is a connection string", metavar="CONNECTION_STR")
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    20
    parser.add_argument("-Q", dest="query",
11
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    21
                      help="query", metavar="QUERY")
1031
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    22
    parser.add_argument("-P", dest="rpp", metavar="RPP", default="100",
15
5d552b6a0e55 add oauth authentication to tweetstream
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 11
diff changeset
    23
                      help="Result per page")
982
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    24
    parser.add_argument("-t", dest="token_filename", metavar="TOKEN_FILENAME", default=".oauth_token",
15
5d552b6a0e55 add oauth authentication to tweetstream
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 11
diff changeset
    25
                      help="Token file name")
982
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    26
    parser.add_argument("-k", "--key", dest="consumer_key",
916
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    27
                        help="Twitter consumer key", metavar="CONSUMER_KEY")
982
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    28
    parser.add_argument("-s", "--secret", dest="consumer_secret",
916
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    29
                        help="Twitter consumer secret", metavar="CONSUMER_SECRET")
15
5d552b6a0e55 add oauth authentication to tweetstream
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 11
diff changeset
    30
    
982
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    31
    utils.set_logging_options(parser)
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    32
    
914
43876221071f update search utilities
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 891
diff changeset
    33
    return parser.parse_args()
11
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    34
914
43876221071f update search utilities
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 891
diff changeset
    35
def get_auth(options, access_token):
43876221071f update search utilities
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 891
diff changeset
    36
    consumer_key = options.consumer_key
43876221071f update search utilities
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 891
diff changeset
    37
    consumer_secret = options.consumer_secret
43876221071f update search utilities
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 891
diff changeset
    38
    auth = twitter.OAuth(token=access_token[0], token_secret=access_token[1], consumer_key=consumer_key, consumer_secret=consumer_secret)
43876221071f update search utilities
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 891
diff changeset
    39
    return auth
9
bb44692e09ee script apres traitement enmi
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents:
diff changeset
    40
1031
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    41
def get_max_id(results):
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    42
    next_results = results.get('search_metadata',{}).get('next_results','');
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    43
    if next_results and next_results.startswith("?"):
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    44
        next_results = next_results[1:]
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    45
    
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    46
    max_ids = urlparse.parse_qs(next_results).get('max_id',[])
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    47
    max_id = 0
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    48
    if max_ids:
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    49
        max_id = int(max_ids[0])
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    50
    return max_id
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    51
9
bb44692e09ee script apres traitement enmi
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents:
diff changeset
    52
if __name__ == "__main__":
bb44692e09ee script apres traitement enmi
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents:
diff changeset
    53
987
18cb05f027a0 correct option reading
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 982
diff changeset
    54
    options = get_options()
914
43876221071f update search utilities
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 891
diff changeset
    55
    
43876221071f update search utilities
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 891
diff changeset
    56
    access_token = utils.get_oauth_token(consumer_key=options.consumer_key, consumer_secret=options.consumer_secret, token_file_path=options.token_filename)
43876221071f update search utilities
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 891
diff changeset
    57
    auth = get_auth(options, access_token)
11
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    58
914
43876221071f update search utilities
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 891
diff changeset
    59
    t = twitter.Twitter(domain="api.twitter.com",api_version="1.1",secure=True, auth=auth)
982
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    60
11c1322cffe6 correct search twitter and topsy
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 917
diff changeset
    61
    conn_str = options.conn_str.strip()
255
500cd0405c7a improve multi processing architecture
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 242
diff changeset
    62
    if not re.match("^\w+://.+", conn_str):
500cd0405c7a improve multi processing architecture
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 242
diff changeset
    63
        conn_str = 'sqlite:///' + conn_str
500cd0405c7a improve multi processing architecture
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 242
diff changeset
    64
289
a5eff8f2b81d improve session maker creation + models version + add model version in db
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 255
diff changeset
    65
    engine, metadata, Session = models.setup_database(conn_str, echo=((options.verbose-options.quiet)>0), create_all=True)
a5eff8f2b81d improve session maker creation + models version + add model version in db
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 255
diff changeset
    66
    session = None
9
bb44692e09ee script apres traitement enmi
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents:
diff changeset
    67
    try:
289
a5eff8f2b81d improve session maker creation + models version + add model version in db
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 255
diff changeset
    68
        session = Session()
11
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    69
        #conn.row_factory = sqlite3.Row
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    70
        #curs = conn.cursor()
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    71
        #curs.execute("create table if not exists tweet_tweet (json);")
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    72
        #conn.commit()
9
bb44692e09ee script apres traitement enmi
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents:
diff changeset
    73
        
11
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    74
        results = None        
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    75
        page = 1
411
0471e6eb8a1b add merge to export
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 289
diff changeset
    76
        print options.query
916
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    77
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    78
        #get current_maxid
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    79
        results = t.search.tweets(q=options.query, result_type="recent")
1031
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    80
        max_id = get_max_id(results)
916
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    81
        if max_id==0:
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    82
            print("No results, exit")
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    83
            sys.exit(0)
11
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    84
        
1031
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    85
        while page <= int(1500/int(options.rpp)) and ( results is None  or len(results.get('statuses',0)) > 0) and max_id > 0:
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    86
            
916
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    87
            results = t.search.tweets(q=options.query, count=options.rpp, max_id=max_id, include_entities=True, result_type='recent')
1031
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    88
            
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    89
            max_id = get_max_id(results)
5d301c2ddb89 Correct pagination for search twitter results.
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 987
diff changeset
    90
411
0471e6eb8a1b add merge to export
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 289
diff changeset
    91
            
916
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    92
            for tweet in results["statuses"]:
11
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    93
                print tweet
18
bd595ad770fc - replace json with anyjson
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 15
diff changeset
    94
                tweet_str = anyjson.serialize(tweet)
bd595ad770fc - replace json with anyjson
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 15
diff changeset
    95
                #invalidate user id
916
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    96
                p = processor.TwitterProcessorStatus(json_dict=tweet, json_txt=tweet_str, source_id=None, session=session, consumer_token=(options.consumer_key, options.consumer_secret), access_token=access_token, token_filename=options.token_filename, user_query_twitter=False, logger=None)
5dce89631093 correct search twitter after api change
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 914
diff changeset
    97
                p.process()
11
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
    98
                session.flush()
122
4c3a15877f80 clean php and python scripts
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 18
diff changeset
    99
                session.commit()
11
54d7f1486ac4 implement get_oauth_token
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 9
diff changeset
   100
            page += 1
122
4c3a15877f80 clean php and python scripts
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 18
diff changeset
   101
            #session.commit()
9
bb44692e09ee script apres traitement enmi
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents:
diff changeset
   102
    finally:
289
a5eff8f2b81d improve session maker creation + models version + add model version in db
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 255
diff changeset
   103
        if session:
a5eff8f2b81d improve session maker creation + models version + add model version in db
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents: 255
diff changeset
   104
            session.close()
9
bb44692e09ee script apres traitement enmi
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents:
diff changeset
   105
bb44692e09ee script apres traitement enmi
Yves-Marie Haussonne <1218002+ymph@users.noreply.github.com>
parents:
diff changeset
   106