Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于lesson3抓取首页,链接和作者的问题 #165

Open
freedomisCode opened this issue Apr 9, 2019 · 5 comments
Open

关于lesson3抓取首页,链接和作者的问题 #165

freedomisCode opened this issue Apr 9, 2019 · 5 comments

Comments

@freedomisCode
Copy link

No description provided.

@freedomisCode
Copy link
Author

//引入依赖
var express = require('express');
var superagent = require('superagent');
var cheerio = require('cheerio');
//建立express实例
var app = express();

app.get('/',function(req,res,next){
//用superagent去抓取https://cnodejs.org/的内容
superagent.get('https://cnodejs.org/')
.end(function(err,sres){
//常规的错误处理
if(err){
return next(err);
}
//sres.txt里面存储着网页的html内容,将它传给cheerio.load之后
//就可以得到一个实现了jquery接口的变量,习惯命名为'$'
var $ = cheerio.load(sres.text);
var items = [];
$('.cell').each(function(idx,element){
var $element = $(element);
items.push({
title:$element.find('.topic_title').attr('title'),
href:$element.find('.topic_title').attr('href'),
author:$element.find('.user_avatar img').attr('title')
});
});
res.send(items);
});
});
app.listen(3000,function(){
console.log('app is running at port 3000');
});

@freedomisCode
Copy link
Author

按照一行的思路抓取的,参照了aimer1124的方法。不知道为啥按照 $('#topic_list .topic_title')抓取不到 @alsotang

@Leonardo-zyh
Copy link

//你自己的问题,可以试试这段代码
$('#topic_list .topic_title ').each(function (idx, element) {
var $element = $(element);
items.push({
title: $element.attr('title'),
href: $element.attr('href'),
author: $element.parents('.cell').find('img').attr('title')
});

@scottMan1001
Copy link

$('#topic_list .user_avatar>img').each(function (idx, element) {
var $element = $(element);
items2.push({
author: $element.attr('title')
});
});
items.map((item,index)=>{
item.author = items2[index].author
}) 多此一举了。。重新找了作者的selector.在 作者基础代码上加入了上述代码。

@zhaoqi1992
Copy link

“作者信息可以通过usser_avatar的href中获取”

let $ = cheerio.load(sres.text);
    let topics = $("#topic_list  .cell");
    let topicList = [];
    topics.each((idx, element) => {
      let $element = $(element);
      let $topicTitle = $element.find(".topic_title");
      let $userAvatar = $element.find(".user_avatar");
      topicList.push({
        title: $topicTitle.attr("title"),
        href: $topicTitle.attr("href"),
        author: $userAvatar
          .attr("href")
          .split("/")
          .pop()
      });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants