Semantic Scholar extracted view of 'Speeding Up Team Learning.' Edmondson et al. Abstract: Large CNNs have delivered impressive performance in various computer vision applications. But the storage and computation requirements make it problematic for deploying these models on mobile devices. Recently, tensor decompositions have been used for speeding up CNNs.
Download PDF (Submitted on 19 Nov 2015 (v1), last revised 14 Feb 2016 (this version, v3))
Abstract: Large CNNs have delivered impressive performance in various computer visionapplications. But the storage and computation requirements make it problematicfor deploying these models on mobile devices. Recently, tensor decompositionshave been used for speeding up CNNs. In this paper, we further develop thetensor decomposition technique. We propose a new algorithm for computing thelow-rank tensor decomposition for removing the redundancy in the convolutionkernels. The algorithm finds the exact global optimizer of the decompositionand is more effective than iterative methods. Based on the decomposition, wefurther propose a new method for training low-rank constrained CNNs fromscratch. Interestingly, while achieving a significant speedup, sometimes thelow-rank constrained CNNs delivers significantly better performance than theirnon-constrained counterparts. On the CIFAR-10 dataset, the proposed low-rankNIN model achieves $91.31%$ accuracy (without data augmentation), which alsoimproves upon state-of-the-art result. We evaluated the proposed method onCIFAR-10 and ILSVRC12 datasets for a variety of modern CNNs, including AlexNet,NIN, VGG and GoogleNet with success. For example, the forward time of VGG-16 isreduced by half while the performance is still comparable. Empirical successsuggests that low-rank tensor decompositions can be a very useful tool forspeeding up large CNNs.
Submission history
From: Cheng Tai [view email][v1]
Thu, 19 Nov 2015 06:13:55 UTC (612 KB)
[v2] Thu, 10 Dec 2015 23:46:17 UTC (636 KB)
[v3]Sun, 14 Feb 2016 03:46:09 UTC (781 KB)
Full-text links:
Download:
Current browse context:
< prev | next >
Change to browse by:
cs
cs.CV
stat
stat.ML
cs.CV
stat
stat.ML
DBLP - CS Bibliography
Cheng Tai
Tong Xiao
Xiaogang Wang
Weinan E
Tong Xiao
Xiaogang Wang
Weinan E
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) Browse v0.2.7 released 2019-12-16
Download PDF (Submitted on 8 Dec 2015 (v1), last revised 29 Jan 2018 (this version, v3))
Abstract: Codes are widely used in many engineering applications to offer robustnessagainst noise. In large-scale systems there are several types of noise that canaffect the performance of distributed machine learning algorithms -- stragglernodes, system failures, or communication bottlenecks -- but there has beenlittle interaction cutting across codes, machine learning, and distributedsystems. In this work, we provide theoretical insights on how coded solutionscan achieve significant gains compared to uncoded ones. We focus on two of themost basic building blocks of distributed learning algorithms: matrixmultiplication and data shuffling. For matrix multiplication, we use codes toalleviate the effect of stragglers, and show that if the number of homogeneousworkers is $n$, and the runtime of each subtask has an exponential tail, codedcomputation can speed up distributed matrix multiplication by a factor of $logn$. For data shuffling, we use codes to reduce communication bottlenecks,exploiting the excess in storage. We show that when a constant fraction$alpha$ of the data matrix can be cached at each worker, and $n$ is the numberof workers, emph{coded shuffling} reduces the communication cost by a factorof $(alpha + frac{1}{n})gamma(n)$ compared to uncoded shuffling, where$gamma(n)$ is the ratio of the cost of unicasting $n$ messages to $n$ users tomulticasting a common message (of the same size) to $n$ users. For instance,$gamma(n) simeq n$ if multicasting a message to $n$ users is as cheap asunicasting a message to one user. We also provide experiment results,corroborating our theoretical gains of the coded algorithms.
Submission history
From: Kangwook Lee [view email][v1] Tue, 8 Dec 2015 21:54:04 UTC (2,376 KB)
[v2] Thu, 10 Dec 2015 19:34:37 UTC (2,376 KB)
[v3]Mon, 29 Jan 2018 03:04:14 UTC (833 KB)
Full-text links:
Download:
Current browse context:
< prev | next >
Change to browse by:
cs
cs.IT
cs.LG
cs.PF
math
math.IT
cs.IT
cs.LG
cs.PF
math
math.IT
DBLP - CS Bibliography
Kangwook Lee
Maximilian Lam
Ramtin Pedarsani
Dimitris S. Papailiopoulos
Dimitris Papailiopoulos
Maximilian Lam
Ramtin Pedarsani
Dimitris S. Papailiopoulos
Dimitris Papailiopoulos
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) Browse v0.2.7 released 2019-12-16