Skip to content
This repository has been archived by the owner on Mar 9, 2021. It is now read-only.

Fetch content from Twitter URLs? #37

Open
rubdottocom opened this issue Mar 23, 2014 · 4 comments
Open

Fetch content from Twitter URLs? #37

rubdottocom opened this issue Mar 23, 2014 · 4 comments

Comments

@rubdottocom
Copy link

Hi!
I'm trying to fetch content from the URLs inside of a Tweet.

When I try to do it for Official Twitter Android app, Twitter only shares with me a text like "read this tweet from @user at http://twitter.com/status/8341234812634".

So I fetch this URL with the hope to get the real tweet text with the real URL that I want to fetch.

However, when I do that I receive from Twitter a sort of warning that I must accept the use of cookies "To bring you Twitter, we and our partners use cookies on our and other websites. Cookies help personalize Twitter content, tailor Twitter Ads, measure their performance and provide you with a better, faster, safer Twitter experience. By using our services, you agree to our Cookie Use. Close".

I tried to set some "user-agent" and "cookie" configuration to HttpURLConnection before fetch Twitter, without success.

Do you know how can I achieve that?

That's currently my code (some dirty, I'm wondering to push you a fix when it works).

public String fetchAsString(String urlAsString, int timeout, boolean includeSomeGooseOptions)
        throws MalformedURLException, IOException {
    HttpURLConnection hConn = createUrlConnection(urlAsString, timeout, includeSomeGooseOptions);
    hConn.setInstanceFollowRedirects(true);

   // Start "hack"
    hConn.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
    Log.d("EXTRACT", hConn.getRequestProperty("User-Agent"));
    CookieManager cookieManager = new CookieManager();
    CookieHandler.setDefault(cookieManager);

    HttpCookie cookie = new HttpCookie("lang", "en");
    cookie.setDomain("twitter.com");
    cookie.setPath("/");
    cookie.setVersion(0);
    try {
        cookieManager.getCookieStore().add(new URI("http://twitter.com/"), cookie);
    } catch (URISyntaxException e) {
        e.printStackTrace();
    }
   // End "hack"

    String encoding = hConn.getContentEncoding();        
    InputStream is;
    if (encoding != null && encoding.equalsIgnoreCase("gzip")) {
        is = new GZIPInputStream(hConn.getInputStream());
    } else if (encoding != null && encoding.equalsIgnoreCase("deflate")) {
        is = new InflaterInputStream(hConn.getInputStream(), new Inflater(true));
    } else {
        is = hConn.getInputStream();
    }

    String enc = Converter.extractEncoding(hConn.getContentType());
    String res = createConverter(urlAsString).streamToString(is, enc);
    if (logger.isDebugEnabled())
        logger.debug(res.length() + " FetchAsString:" + urlAsString);
    return res;
}
@karussell
Copy link
Owner

Why not use the official Twitter API? I think they don't like scraping ;)

@karussell
Copy link
Owner

BTW: I would personally also being interested in scraping twitter ;)
BTW2: normally snacktory already accepts all cookies. See HtmlFetcher:

static {
    SHelper.enableCookieMgmt();
    SHelper.enableUserAgentOverwrite();
    SHelper.enableAnySSL();
}

@rubdottocom
Copy link
Author

Well... I don't want that the user needs to do a Twitter authentication with my App, so I'm receiving content from other apps through Share option across Android system.

I'll investigate further, thanks

@karussell
Copy link
Owner

I fear you'll need to do some JavaScript hacks. Or investigate how blind people, browsers like lynx or JS-disabled browsers can access Twitter. Also RSS does not seem to work anymore: https://twitter.com/timetabling.rss

rborer pushed a commit to finity-ai/snacktory that referenced this issue Jun 19, 2017
/home/vagrant/snacktory/src/main/java/de/jetwick/snacktory/ArticleTextExtractor.java:903: error: bad use of '>'
[error]      * www.test.airpr.com -> Won't work, always expect a topLevelPrivateDomain
[error]
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants