Article From:
def calcutype(dataframe,model,xiangguandict):
    '''Main function ''Typelist = {{}.Xiangguan = {{}.Res = {{}.Pool = multiprocessing.Pool (40)For index, row in dataframe.iterrows ():Scorearr = []Name = row['data_name1'].split (',') #data_nameDescrip = row['data_descrip1'].split (',') #data_descripClassification and correlationRes[index]=pool.apply_async (calcumodelnum, (name, descrip, model, xiangguand)ICT))Pool.close ()Pool.join ()For I in res:The calculation of the correlation degree of this classification is completedTypelist[i] = res[i].get () [0]Xiangguan[i] = res[i].get () [1]Return typelIst, XiangguanDef calcumodelnum (* * * * * * * * * *) (multiprocessing.current_process ().NAme+'has finished computing.Return resultsDef main ():Data1 = * * * * * * * * * * *Model = * * * * * * * * * * *Dict1 = * * * * * * * * * * *Calcutype (data1, model, dict1)If __name__ = ='__main__':Main ()

You see my code above, because I ran on the 40 core machine, so I started 40 processes, but look at the CPU case.

MingmingA lot of tasks,But there is no full, a lot of nuclear empty,

Like these times, it doesn’t move at all.

logThe display does start 40 processes, but those processes don’t move as if they started.

The running log is as follows:

I tried again to start the 10 process, and finally the program runs almost as long as 40!!!

What’s the matter? Ask the great God to point out. (the for cycle is very long and the calculation is strong enough)

Answer 0:

multiprocessing.Pool It is just used to start multiple processes instead of starting a process on each core. In other words, the Python interpreter itself does not do load balancing at every core or processor. This is determined by the operating system. If your job is particularly computationally intensive,The operating system does allocate more core, but this is not what Python or code can control or specify.

multiprocessing.Pool(num)The num can be very small or very large, for example, I/O intensive operation, this value can be greater than the number of CPU.

Hardware resource allocation is determined by the operating system, and if you want every core to work, you need to start more from the operating system ~

Answer 1:

The intensity of the calculation is not enough, and the preceding processes are finished.

Answer 2:

I think the explanation here is clearer. Https://…

Answer 3:

Asynchronous process pool (non blocking)

Answer 4:

Code modification, should be able to run full of CPU

from multiprocessing import Pool, cpu_count, current_process

data1 = '...'  # dataframe
model = '...'
dict1 = '...'
typelist = {}
xiangguan = {}
res = {}

def calcutype(row):
    index, data = row
    name = data['data_name1'].split(',')  # data_name
    descrip = data['data_descrip1'].split(',')  # data_descrip
    res[index] = calcumodelnum(name, descrip) + 'It's finished.Def calcumodelnum (name, DES):"" "your computing logic" "" "PassDef main ():Pool = Pool (cpu_cOunt ()) (calcutype, data1.iterrows ())Pool.close ()Pool.join ()For I in res:The calculation of the correlation degree of this classification is completedTypelist[i] = res[i].get () [0]Xiangguan[i] = res[i].get () [1]Return typelist, XiangguanIf __name__ = ='__main__':Main ()

Try it on

Leave a Reply

Your email address will not be published. Required fields are marked *