Article From:https://www.cnblogs.com/areyouready/p/9906473.html
1、mrProcedure efficiency bottleneckFunction: Distributed Offline ComputingComputer performance: CPU, memory, disk, networkI/OOperation optimization(1)Data skew (code optimization)(2)mapUnreasonable Setting of Reduction Number and Reduction Number(3)mapRunning time is too long, causing reduce to wait too long(4)Combine Text Input Fomrat Small File Merge(5)Non-separable super-large files (continuous overwriting)(6)Multiple small overwritten files require multiple merges2、mroptimization methodSix aspects are considered: data input, Map phase, Reduce phase, IO transmission,Data skew and parameter tuning1­>data input(1)Merge small files: merge small files before performing MR tasks(2)Combine Text Input Format as input to solve a large number of small files on the input side of the sceneMR is not suitable for handling large numbers of small files2­>Mapstage(1)Reduce the number of overwrites (increase memory by 200M 80%<property>
            <name>mapreduce.task.io.sort.mb</name>
            <value>100</value>
        </property>
        <property>
            <name>mapreduce.map.sort.spill.percent</name>
            <value>0.80</value>
        </property>
        (2)Reduce the number of mergers<property>
            <name>mapreduce.task.io.sort.factor</name>
            <value>10</value>
        </property>3)Combiner after map without affecting business logic3­>Reducestage(1)Rationally Setting up the Number of Maps and Reduces(2)Set map/reducecoexistenceSetting up reduce after running a map to a certain extent reduces waiting time<property>
            <name>mapreduce.job.reduce.slowstart.completedmaps</name>
            <value>0.05</value>
        </property>3)Setting buffer of reduce end reasonably<property>
            <name>mapreduce.reduce.markreset.buffer.percent</name>
            <value>0.0</value>
        </property>
    4­>transmission(1)Data compression(2)Use sequenceFile5­>Data skew(1)Scope partitioning(2)Custom partition(3)Combine
        (4)You can use map join without reducing join6­>Parameter tuningSetting Core NumberMap Core Number Settings:<property>
            <name>mapreduce.map.cpu.vcores</name>
            <value>1</value>
        </property>
        reduceCore Number Settings:<property>
            <name>mapreduce.reduce.cpu.vcores</name>
            <value>1</value>
        </property>
        Set memoryMaptask memory settings:<property>
            <name>mapreduce.map.memory.mb</name>
            <value>1024</value>
        </property>
        reducetaskMemory settings:<property>
            <name>mapreduce.reduce.memory.mb</name>
            <value>1024</value>
        </property>
        reduceGet data parallelism on the map side<property>
            <name>mapreduce.reduce.shuffle.parallelcopies</name>
            <value>5</value>
        </property>

 

Link of this Article: Hadoop optimization

Leave a Reply

Your email address will not be published. Required fields are marked *