&

分類:

推薦最新等你回答

笨尐豬回答

簡(jiǎn)單循環(huán)

最簡(jiǎn)單的方式就是循環(huán)拆分一下唄。先上最簡(jiǎn)單方法：

import pandas as pd
df = pd.DataFrame({'A':['1','2','3'],'B':['1','2,3','4,5,6'],'C':['3','3','3']})
result = pd.DataFrame(columns=['A','B','C'])
print(df,'\n')
for i in df.itertuples():
    for j in i[2].split(','):
        result = result.append({'A':i[1],'B':j,'C':i[3]},ignore_index=True)        
print(result)

輸出:

   A      B  C
0  1      1  3
1  2    2,3  3
2  3  4,5,6  3 

   A  B  C
0  1  1  3
1  2  2  3
2  2  3  3
3  3  4  3
4  3  5  3
5  3  6  3

更高效的方法

采用expand直接進(jìn)行擴(kuò)展

df = pd.DataFrame({'A':['1','2','3'],'B':['1','2,3','4,5,6'],'C':['3','3','3']})
df = (df.set_index(['A','C'])['B']
       .str.split(',', expand=True)
       .stack()
       .reset_index(level=2, drop=True)
       .reset_index(name='B'))
print(df)

react中的setState為什么可以這樣寫?

孤星回答

es6 簡(jiǎn)寫方式

在爬取網(wǎng)址時(shí)，如何用python的正則匹配？

笑浮塵回答

一樓的finditer方法是一個(gè)非常好的方法，它會(huì)返回一個(gè)迭代器，而不是返回所有的匹配數(shù)據(jù)，這樣的好處一個(gè)是節(jié)省內(nèi)存，另一個(gè)是能逐個(gè)輸出，樓主可以參考，謝謝

matplotlib畫圖越來(lái)越慢？

陌上花回答

盡少調(diào)用 plt.scatter 方法便可大幅提升性能.

詳解
假設(shè) WX_b 為 M N 矩陣, mx 為 M 1 矩陣, 下面代碼

for i in range(WX_b.shape[0]):
    for j in range(WX_b.shape[1]):
        plt.scatter(mx[i], WX_b[i][j])

可以優(yōu)化成

plt.scatter(mx.repeat(WX_b.shape[1], axis=1), WX_b)

jupyter 示例代碼

%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

WX_b = np.random.randn(30, 5)
mx = np.random.randn(WX_b.shape[0], 1)

def func1():
    for i in range(WX_b.shape[0]):
        for j in range(WX_b.shape[1]):
            plt.scatter(mx[i], WX_b[i][j])
            
def func2():
    plt.scatter(mx.repeat(WX_b.shape[1], axis=1), WX_b)
    
%time func1()
%time func2()

參考結(jié)果: func2 運(yùn)行時(shí)間大約是 func1 的 5%.

遞歸遍歷多維數(shù)組，返回遍歷前的該數(shù)組

吃藕丑回答

<?php

public function b($arr = array()) {
    if (!empty($arr)) {
        return "";
    } else {
        foreach ($arr as &$v) {
            if (is_array($v)) {
                $v = $this->b($v);
            } else {
                $v = $v + 1;
            }
        }
        return $arr;
    }
}

python strftime("%Y-%m-%d") 無(wú)法被識(shí)別

別瞎鬧回答

python 基礎(chǔ)有待加強(qiáng)

#df = ts.get_tick_data('601688',date='begin.strftime("%Y-%m-%d")') 
df = ts.get_tick_data('601688',date=begin.strftime("%Y-%m-%d"))

SRC中blob:https://的圖片URL ，如何獲取真正的鏈接地址（急）

離魂曲回答

簡(jiǎn)單粗暴的方法，截圖

數(shù)組操作問題

笑忘初回答

let result = arr2.filter((v, i)=>arr1[i] && /\D/.test(arr1[i]));
console.log(result);

知乎爬單個(gè)話題下的關(guān)注者，超過(guò)1W個(gè)就重復(fù)出現(xiàn)前20個(gè)

不討囍回答

csdn上面的，直接搬了過(guò)來(lái):

因?yàn)橐鲇^點(diǎn)，觀點(diǎn)的屋子類似于知乎的話題，所以得想辦法把他給爬下來(lái)，搞了半天最終還是妥妥的搞定了，代碼是python寫的，不懂得麻煩自學(xué)哈！懂得直接看代碼，絕對(duì)可用


#coding:utf-8
"""
@author:haoning
@create time:2015.8.5
"""
from __future__ import division  # 精確除法
from Queue import Queue
from __builtin__ import False
import json
import os
import re
import platform
import uuid
import urllib
import urllib2
import sys
import time
import MySQLdb as mdb
from bs4 import BeautifulSoup


reload(sys)
sys.setdefaultencoding( "utf-8" )


headers = {
   'User-Agent' : 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0',
   'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
   'X-Requested-With':'XMLHttpRequest',
   'Referer':'https://www.zhihu.com/topics',
   'Cookie':'__utma=51854390.517069884.1416212035.1416212035.1416212035.1; q_c1=c02bf44d00d240798bfabcfc95baeb56|1455778173000|1416205243000; _za=b1c8ae35-f986-46a2-b24a-cb9359dc6b2a; aliyungf_tc=AQAAAJ1m71jL1woArKqF22VFnL/wRy6C; _xsrf=9d494558f9271340ab24598d85b2a3c8; cap_id="MDNiMjcwM2U0MTRhNDVmYjgxZWVhOWI0NTA2OGU5OTg=|1455864276|2a4ce8247ebd3c0df5393bb5661713ad9eec01dd"; n_c=1; _alicdn_sec=56c6ba4d556557d27a0f8c876f563d12a285f33a'
}


DB_HOST = '127.0.0.1'
DB_USER = 'root'
DB_PASS = 'root'


queue= Queue() #接收隊(duì)列
nodeSet=set()
keywordSet=set()
stop=0
offset=-20
level=0
maxLevel=7
counter=0
base=""


conn = mdb.connect(DB_HOST, DB_USER, DB_PASS, 'zhihu', charset='utf8')
conn.autocommit(False)
curr = conn.cursor()


def get_html(url):
    try:
        req = urllib2.Request(url)
        response = urllib2.urlopen(req,None,3) #在這里應(yīng)該加入代理
        html = response.read()
        return html
    except:
        pass
    return None


def getTopics():
    url = 'https://www.zhihu.com/topics'
    print url
    try:
        req = urllib2.Request(url)
        response = urllib2.urlopen(req) #鍦ㄨ繖閲屽簲璇ュ姞鍏ヤ唬鐞?
        html = response.read().decode('utf-8')
        print html
        soup = BeautifulSoup(html)
        lis = soup.find_all('li', {'class' : 'zm-topic-cat-item'})
        
        for li in lis:
            data_id=li.get('data-id')
            name=li.text
            curr.execute('select id from classify_new where name=%s',(name))
            y= curr.fetchone()
            if not y:
                curr.execute('INSERT INTO classify_new(data_id,name)VALUES(%s,%s)',(data_id,name))
        conn.commit()
    except Exception as e:
        print "get topic error",e
        


def get_extension(name):  
    where=name.rfind('.')
    if where!=-1:
        return name[where:len(name)]
    return None




def which_platform():
    sys_str = platform.system()
    return sys_str


def GetDateString():
    when=time.strftime('%Y-%m-%d',time.localtime(time.time()))
    foldername = str(when)
    return foldername 


def makeDateFolder(par,classify):
    try:
        if os.path.isdir(par):
            newFolderName=par + '//' + GetDateString() + '//'  +str(classify)
            if which_platform()=="Linux":
                newFolderName=par + '/' + GetDateString() + "/" +str(classify)
            if not os.path.isdir( newFolderName ):
                os.makedirs( newFolderName )
            return newFolderName
        else:
            return None 
    except Exception,e:
        print "kk",e
    return None 


def download_img(url,classify):
    try:
        extention=get_extension(url)
        if(extention is None):
            return None
        req = urllib2.Request(url)
        resp = urllib2.urlopen(req,None,3)
        dataimg=resp.read()
        name=str(uuid.uuid1()).replace("-","")+"_www.guandn.com"+extention
        top="E://topic_pic"
        folder=makeDateFolder(top, classify)
        filename=None
        if folder is not None:
            filename  =folder+"http://"+name
        try:
            if "e82bab09c_m" in str(url):
                return True
            if not os.path.exists(filename):
                file_object = open(filename,'w+b')
                file_object.write(dataimg)
                file_object.close()
                return '/room/default/'+GetDateString()+'/'+str(classify)+"/"+name
            else:
                print "file exist"
                return None
        except IOError,e1:
            print "e1=",e1
            pass
    except Exception as e:
        print "eee",e
        pass
    return None #如果沒有下載下來(lái)就利用原來(lái)網(wǎng)站的鏈接


def getChildren(node,name):
    global queue,nodeSet
    try:
        url="https://www.zhihu.com/topic/"+str(node)+"/hot"
        html=get_html(url)
        if html is None:
            return
        soup = BeautifulSoup(html)
        p_ch='父話題'
        node_name=soup.find('div', {'id' : 'zh-topic-title'}).find('h1').text
        topic_cla=soup.find('div', {'class' : 'child-topic'})
        if topic_cla is not None:
            try:
                p_ch=str(topic_cla.text)
                aList = soup.find_all('a', {'class' : 'zm-item-tag'}) #獲取所有子節(jié)點(diǎn)
                if u'子話題' in p_ch:
                    for a in aList:
                        token=a.get('data-token')
                        a=str(a).replace('\n','').replace('\t','').replace('\r','')
                        start=str(a).find('>')
                        end=str(a).rfind('</a>')
                        new_node=str(str(a)[start+1:end])
                        curr.execute('select id from rooms where name=%s',(new_node)) #先保證名字絕不相同
                        y= curr.fetchone()
                        if not y:
                            print "y=",y,"new_node=",new_node,"token=",token
                            queue.put((token,new_node,node_name))
            except Exception as e:
                print "add queue error",e
    except Exception as e:
        print "get html error",e
        
    


def getContent(n,name,p,top_id):
    try:
        global counter
        curr.execute('select id from rooms where name=%s',(name)) #先保證名字絕不相同
        y= curr.fetchone()
        print "exist?? ",y,"n=",n
        if not y:
            url="https://www.zhihu.com/topic/"+str(n)+"/hot"
            html=get_html(url)
            if html is None:
                return
            soup = BeautifulSoup(html)
            title=soup.find('div', {'id' : 'zh-topic-title'}).find('h1').text
            pic_path=soup.find('a',{'id':'zh-avartar-edit-form'}).find('img').get('src')
            description=soup.find('div',{'class':'zm-editable-content'})
            if description is not None:
                description=description.text
                
            if (u"未歸類" in title or u"根話題" in title): #允許入庫(kù)，避免死循環(huán)
                description=None
                
            tag_path=download_img(pic_path,top_id)
            print "tag_path=",tag_path
            if (tag_path is not None) or tag_path==True:
                if tag_path==True:
                    tag_path=None
                father_id=2 #默認(rèn)為雜談
                curr.execute('select id from rooms where name=%s',(p))
                results = curr.fetchall()
                for r in results:
                    father_id=r[0]
                name=title
                curr.execute('select id from rooms where name=%s',(name)) #先保證名字絕不相同
                y= curr.fetchone()
                print "store see..",y
                if not y:
                    friends_num=0
                    temp = time.time()
                    x = time.localtime(float(temp))
                    create_time = time.strftime("%Y-%m-%d %H:%M:%S",x) # get time now
                    create_time
                    creater_id=None
                    room_avatar=tag_path
                    is_pass=1
                    has_index=0
                    reason_id=None  
                    #print father_id,name,friends_num,create_time,creater_id,room_avatar,is_pass,has_index,reason_id
                    ######################有資格入庫(kù)的內(nèi)容
                    counter=counter+1
                    curr.execute("INSERT INTO rooms(father_id,name,friends_num,description,create_time,creater_id,room_avatar,is_pass,has_index,reason_id)VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)",(father_id,name,friends_num,description,create_time,creater_id,room_avatar,is_pass,has_index,reason_id))
                    conn.commit() #必須時(shí)時(shí)進(jìn)入數(shù)據(jù)庫(kù)，不然找不到父節(jié)點(diǎn)
                    if counter % 200==0:
                        print "current node",name,"num",counter
    except Exception as e:
        print "get content error",e       


def work():
    global queue
    curr.execute('select id,node,parent,name from classify where status=1')
    results = curr.fetchall()
    for r in results:
        top_id=r[0]
        node=r[1]
        parent=r[2]
        name=r[3]
        try:
            queue.put((node,name,parent)) #首先放入隊(duì)列
            while queue.qsize() >0:
                n,p=queue.get() #頂節(jié)點(diǎn)出隊(duì)
                getContent(n,p,top_id)
                getChildren(n,name) #出隊(duì)內(nèi)容的子節(jié)點(diǎn)
            conn.commit()
        except Exception as e:
            print "what's wrong",e  
            
def new_work():
    global queue
    curr.execute('select id,data_id,name from classify_new_copy where status=1')
    results = curr.fetchall()
    for r in results:
        top_id=r[0]
        data_id=r[1]
        name=r[2]
        try:
            get_topis(data_id,name,top_id)
        except:
            pass




def get_topis(data_id,name,top_id):
    global queue
    url = 'https://www.zhihu.com/node/TopicsPlazzaListV2'
    isGet = True;
    offset = -20;
    data_id=str(data_id)
    while isGet:
        offset = offset + 20
        values = {'method': 'next', 'params': '{"topic_id":'+data_id+',"offset":'+str(offset)+',"hash_id":""}'}
        try:
            msg=None
            try:
                data = urllib.urlencode(values)
                request = urllib2.Request(url,data,headers)
                response = urllib2.urlopen(request,None,5)
                html=response.read().decode('utf-8')
                json_str = json.loads(html)
                ms=json_str['msg']
                if len(ms) <5:
                    break
                msg=ms[0]
            except Exception as e:
                print "eeeee",e
            #print msg
            if msg is not None:
                soup = BeautifulSoup(str(msg))
                blks = soup.find_all('div', {'class' : 'blk'})
                for blk in blks:
                    page=blk.find('a').get('href')
                    if page is not None:
                        node=page.replace("/topic/","") #將更多的種子入庫(kù)
                        parent=name
                        ne=blk.find('strong').text
                        try:
                            queue.put((node,ne,parent)) #首先放入隊(duì)列
                            while queue.qsize() >0:
                                n,name,p=queue.get() #頂節(jié)點(diǎn)出隊(duì)
                                size=queue.qsize()
                                if size > 0:
                                    print size
                                getContent(n,name,p,top_id)
                                getChildren(n,name) #出隊(duì)內(nèi)容的子節(jié)點(diǎn)
                            conn.commit()
                        except Exception as e:
                            print "what's wrong",e  
        except urllib2.URLError, e:
            print "error is",e
            pass 
            
        
if __name__ == '__main__':
    i=0
    while i<400:
        new_work()
        i=i+1

說(shuō)下數(shù)據(jù)庫(kù)的問題，我這里就不傳附件了，看字段自己建立，因?yàn)檫@確實(shí)太簡(jiǎn)單了，我是用的mysql，你看自己的需求自己建。

有什么不懂得麻煩去去轉(zhuǎn)盤網(wǎng)找我，因?yàn)檫@個(gè)也是我開發(fā)的，上面會(huì)及時(shí)更新qq群號(hào)，這里不留qq號(hào)啥的，以免被系統(tǒng)給K了。

一個(gè)關(guān)于數(shù)組的問題

失魂人回答

因?yàn)槟阒籱akeRow了一次，矩陣中的每一“行” 都引用了同一個(gè)數(shù)組，你改矩陣中的值就相當(dāng)于改 “行” 中的一個(gè)

網(wǎng)絡(luò)爬蟲js 加密破解

喜歡你回答

我去年爬IT桔子的時(shí)候也卡在了登錄這里，后來(lái)我是直接把登入后的cookies放到程序中解決的。。。

js使用concat拼接的對(duì)象數(shù)組在遍歷賦值時(shí)為什么會(huì)賦予同一值？

下墜回答

又是引用問題

var a = {};
var b = a;
b.id = 1;
console.log(a)//{ id: 1 }

為何pandas.read_csv不能讀取中文內(nèi)容而使用f=open()之后就可以了？

祉小皓回答

因?yàn)閞ead_csv的第一個(gè)參數(shù)是：

filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)

所以可以接受open之后的io對(duì)象，而open函數(shù)是支持中文名字的，所以不會(huì)出現(xiàn)打開錯(cuò)誤

怎樣用bs4的find或是select方法獲取我所需要的這行？

鹿惑回答

頁(yè)面應(yīng)該是有做過(guò)反爬蟲處理的，有關(guān)數(shù)據(jù)在html源碼中是被注釋掉的，可以先把注釋符號(hào)去掉再進(jìn)行解析

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.basketball-reference.com/teams/MIN/2018.html#all_per_game')
// 去掉html的注釋符號(hào)，并進(jìn)行解析
soup = BeautifulSoup(r.text.replace('<!--','').replace('-->',''),'lxml')
trs = soup.select('#per_game > tbody > tr')
print(trs[0])

如何實(shí)現(xiàn)兩個(gè)dataframe相減？

紓惘回答

如果你dataframe2的index和dataframe1是一致的
dataframe1.drop(dataframe2.index)

vue-es6拼接的字符串可以使用vue-Methods方法嗎？

焚音回答

你的點(diǎn)擊事件是加在"駕駛員"這個(gè)span標(biāo)簽上的

javascript 如何刪除數(shù)組中特定的值？

陌顏回答

直接過(guò)濾掉那個(gè)index不就可以了嗎

var arr = [{index: 1, a: "1", b: "2", c: "3", d: "4"},{index: 2, a: "4", b: "5", c: "6", d: "7"}];
var result = arr.filter(o=>o.index != 1);
console.log(result);

String類型的數(shù)組，如何解析成正常的數(shù)組類型？

荒城回答

你提供的json字符串不是一個(gè)有效的json字符串。

用下面的看看：

var json =  '[{     "title": "演出時(shí)長(zhǎng)",     "desc": "2" }, {     "title": "入場(chǎng)時(shí)間",     "desc": "這是入場(chǎng)時(shí)間" }, {     "title": "限購(gòu)說(shuō)明",     "desc": "每單限購(gòu)6張" }, {     "title": "座位類型",     "desc": "請(qǐng)按門票對(duì)應(yīng)座位，有序?qū)μ?hào)入座" }, {     "title": "兒童入場(chǎng)提示 ",     "desc": "1.2米以上憑票入場(chǎng)，1.2米以下謝絕入場(chǎng)" }, {     "title": "禁止攜帶物品",     "desc": "食品、飲料、相機(jī)、充電寶、打火機(jī)等" }, {     "title": "演出語(yǔ)言",     "desc": "中文" }, {     "title": "演出形式",     "desc": "這是演出形式" }, {     "title": "其他說(shuō)明",     "desc": "這是一段購(gòu)買須知" }, {     "title": "實(shí)體票",     "desc": "本項(xiàng)目支持憑實(shí)體票入場(chǎng)，支持以下取票方式： -快遞配送： 運(yùn)費(fèi)10元(V2及以上會(huì)員包郵)， 順豐發(fā)貨。 -上門自提： 前往門店自取， 門店店))。 -現(xiàn)場(chǎng)取票： 工作人員將在。" }, {     "title ": "電子票 ",     "desc ": "本項(xiàng)目支持憑電子票入場(chǎng)。 -現(xiàn)場(chǎng)掃碼驗(yàn)票或憑姓名手機(jī)號(hào)入場(chǎng)（ 以現(xiàn)場(chǎng)為準(zhǔn)）； 掃碼驗(yàn)票流程： 打開APP→ 訂單詳情→ 票券詳情→ 現(xiàn)場(chǎng)掃碼入場(chǎng)。 " }]';

var obj = JSON.parse(json);

console.log(obj[0].title);

HTML5拖曳事件如何阻止拖曳到其他元素？

空痕回答

event.dataTransfer.setData('text/plain', '')

IT 桔子爬蟲

毀憶回答

from selenium import webdriver
import time
with open('../password.txt', 'r') as r:
    username, password = r.readline().split(',')

chrome = webdriver.Chrome()
chrome.get('https://www.itjuzi.com/user/login')
time.sleep(2)
chrome.find_element_by_xpath('//*[@id="create_account_email"]').send_keys(username)
chrome.find_element_by_xpath('//*[@id="create_account_password"]').send_keys(password)
chrome.find_element_by_xpath('//*[@id="login_btn"]').click()

最簡(jiǎn)單的登陸方式