Name Sketching


Employ sketching algorithms to create representative data.


How can we reduce the size of the input dataset?




Known Uses To deal with the large label size problem, recent works propose sketch-based methods to approximate the distribution on labels per node thereby achieving a space reduction from O(m) to O(log m), under certain conditions. Sketching and Neural Networks Binary embeddings with structured hashed projections Analysis of a randomized approximation scheme for matrix multiplication

This note gives a simple analysis of a randomized approximation scheme for matrix multiplication proposed by [Sar06] based on a random rotation followed by uniform column sampling. The result follows from a matrix version of Bernstein’s inequality and a tail inequality for quadratic forms in subgaussian random vectors. Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

Our main focus is on the underlying theory and practical implementation of random projection and random sampling algorithms for very large very overdetermined (i.e., overconstrained) ℓ1 and ℓ2 regression problems. Randomization can be used in one of two related ways: either to construct sub-sampled problems that can be solved, exactly or approximately, with traditional numerical methods; or to construct preconditioned versions of the original full problem that are easier to solve with traditional iterative algorithms. Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data

Related Patterns



1] Zhang, Xiangyu, et al. “Efficient and accurate approximations of nonlinear convolutional networks.” arXiv preprint arXiv:1411.4229 (2014).

[2] Tai, Cheng, et al. “Convolutional neural networks with low-rank regularization.” arXiv preprint arXiv:1511.06067 (2015). SHORT AND DEEP: SKETCHING AND NEURAL NETWORKS

Data-independent methods for dimensionality reduction such as random projections, sketches, and feature hashing have become increasingly popular in recent years. These methods often seek to reduce dimensionality while preserving the hypothesis class, resulting in inherent lower bounds on the size of projected data. We show that the dimensionality can be reduced further while maintaining performance guarantees, using improper learning with a slightly larger hypothesis class. In particular, we show that any sparse polynomial function of a sparse binary vector can be computed from a compact sketch by a single-layer neural network, where the sketch size has a logarithmic dependence on the polynomial degree. We empirically show that our approach leads to networks with fewer parameters than related methods such as feature hashing, at equal or better performance.