De Bruijn sequence
In combinatorial mathematics, a de Bruijn sequence of order n on a size-k alphabet A is a cyclic sequence in which every possible length-n string on A occurs exactly once as a substring (i.e., as a contiguous subsequence). Such a sequence is denoted by B(k, n) and has length kn, which is also the number of distinct strings of length n on A. Each of these distinct strings, when taken as a substring of B(k, n), must start at a different position, because substrings starting at the same position are not distinct. Therefore, B(k, n) must have at least kn symbols. And since B(k, n) has exactly kn symbols, De Bruijn sequences are optimally short with respect to the property of containing every string of length n exactly once.
The number of distinct de Bruijn sequences B(k, n) is
The sequences are named after the Dutch mathematician Nicolaas Govert de Bruijn, who wrote about them in 1946. As he later wrote,[1] the existence of de Bruijn sequences for each order together with the above properties were first proved, for the case of alphabets with two elements, by Camille Flye Sainte-Marie (1894). The generalization to larger alphabets is due to Tatyana van Aardenne-Ehrenfest and de Bruijn (1951). Automata for recognizing these sequences are denoted as de Bruijn automata and are similar topologically to some time-delay neural networks.[2]
In most applications, A = {0,1}.
History
The earliest known example of a de Bruijn sequence comes from Sanskrit prosody where, since the work of Pingala, each possible three-syllable pattern of long and short syllables is given a name, such as 'y' for short–long–long and 'm' for long–long–long. To remember these names, the mnemonic yamātārājabhānasalagām is used, in which each three-syllable pattern occurs starting at its name: 'yamātā' has a short–long–long pattern, 'mātārā' has a long–long–long pattern, and so on, until 'salagām' which has a short–short–long pattern. This mnemonic, equivalent to a de Bruijn sequence on binary 3-tuples, is of unknown antiquity, but is at least as old as Charles Philip Brown's 1869 book on Sanskrit prosody that mentions it and considers it "an ancient line, written by Pāṇini".[3]
In 1894, A. de Rivière raised the question in an issue of the French problem journal L'Intermédiaire des Mathématiciens, of the existence of a circular arrangement of zeroes and ones of size that contains all binary sequences of length . The problem was solved (in the affirmative), along with the count of distinct solutions, by Camille Flye Sainte-Marie in the same year.[1] This was largely forgotten, and Martin (1934) proved the existence of such cycles for general alphabet size in place of 2, with an algorithm for constructing them. Finally, when in 1944 Kees Posthumus conjectured the count for binary sequences, de Bruijn proved the conjecture in 1946, through which the problem became well-known.[1]
Karl Popper independently describes these objects in his The Logic of Scientific Discovery (1934), calling them "shortest random-like sequences".[4]
Examples
- Taking A = {0, 1}, there are two distinct B(2, 3): 00010111 and 11101000, one being the reverse or negation of the other.
- Two of the 2048 possible B(2, 5) in the same alphabet are 00000100011001010011101011011111 and 00000101001000111110111001101011.
Construction
The de Bruijn sequences can be constructed by taking a Hamiltonian path of an n-dimensional de Bruijn graph over k symbols (or equivalently, an Eulerian cycle of an (n − 1)-dimensional de Bruijn graph).[5]
An alternative construction involves concatenating together, in lexicographic order, all the Lyndon words whose length divides n.[6]
An inverse Burrows—Wheeler transform can be used to generate the required Lyndon words in lexicographic order.[7]
De Bruijn sequences can also be constructed using shift registers[8] or via finite fields.[9]
Example using de Bruijn graph
Goal: to construct a B(2, 4) de Bruijn sequence of length 24 = 16 using Eulerian (n − 1 = 4 − 1 = 3) 3-D de Bruijn graph cycle.
Each edge in this 3-dimensional de Bruijn graph corresponds to a sequence of four digits: the three digits that label the vertex that the edge is leaving followed by the one that labels the edge. If one traverses the edge labeled 1 from 000, one arrives at 001, thereby indicating the presence of the subsequence 0001 in the de Bruijn sequence. To traverse each edge exactly once is to use each of the 16 four-digit sequences exactly once.
For example, suppose we follow the following Eulerian path through these vertices:
- 000, 000, 001, 011, 111, 111, 110, 101, 011,
- 110, 100, 001, 010, 101, 010, 100, 000.
These are the output sequences of length k:
- 0 0 0 0
- _ 0 0 0 1
- _ _ 0 0 1 1
This corresponds to the following de Bruijn sequence:
- 0 0 0 0 1 1 1 1 0 1 1 0 0 1 0 1
The eight vertices appear in the sequence in the following way:
{0 0 0 0} 1 1 1 1 0 1 1 0 0 1 0 1 0 {0 0 0 1} 1 1 1 0 1 1 0 0 1 0 1 0 0 {0 0 1 1} 1 1 0 1 1 0 0 1 0 1 0 0 0 {0 1 1 1} 1 0 1 1 0 0 1 0 1 0 0 0 0 {1 1 1 1} 0 1 1 0 0 1 0 1 0 0 0 0 1 {1 1 1 0} 1 1 0 0 1 0 1 0 0 0 0 1 1 {1 1 0 1} 1 0 0 1 0 1 0 0 0 0 1 1 1 {1 0 1 1} 0 0 1 0 1 0 0 0 0 1 1 1 1 {0 1 1 0} 0 1 0 1 0 0 0 0 1 1 1 1 0 {1 1 0 0} 1 0 1 0 0 0 0 1 1 1 1 0 1 {1 0 0 1} 0 1 0 0 0 0 1 1 1 1 0 1 1 {0 0 1 0} 1 0 0 0 0 1 1 1 1 0 1 1 0 {0 1 0 1} 0} 0 0 0 1 1 1 1 0 1 1 0 0 {1 0 1 ... ... 0 0} 0 0 1 1 1 1 0 1 1 0 0 1 {0 1 ... ... 0 0 0} 0 1 1 1 1 0 1 1 0 0 1 0 {1 ...
...and then we return to the starting point. Each of the eight 3-digit sequences (corresponding to the eight vertices) appears exactly twice, and each of the sixteen 4-digit sequences (corresponding to the 16 edges) appears exactly once.
Example using inverse Burrows—Wheeler transform
Mathematically, an inverse Burrows—Wheeler transform on a word w generates a multi-set of equivalence classes consisting of strings and their rotations.[7] These equivalence classes of strings each contain a Lyndon word as a unique minimum element, so the inverse Burrows—Wheeler transform can be considered to generate a set of Lyndon words. It can be shown that if we perform the inverse Burrows—Wheeler transform on a word w consisting of the size-k alphabet repeated kn-1 times (so that it will produce a word the same length as the desired de Bruijn sequence), then the result will be the set of all Lyndon words whose length divides n. It follows that arranging these Lyndon words in lexicographic order will yield a de Bruijn sequence B(k,n), and that this will be the first de Bruijn sequence in lexicographic order. The following method can be used to perform the inverse Burrows—Wheeler transform, using its standard permutation:
- Sort the characters in the string w, yielding a new string w'
- Position the string w' above the string w, and map each letter's position in w' to its position in w while preserving order. This process defines the Standard Permutation.
- Write this permutation in cycle notation with the smallest position in each cycle first, and the cycles sorted in increasing order.
- For each cycle, replace each number with the corresponding letter from string w' in that position.
- Each cycle has now become a Lyndon word, and they are arranged in lexicographic order, so dropping the parentheses yields the first de Bruijn sequence.
For example, to construct the smallest B(2,4) de Bruijn sequence of length 24 = 16, repeat the alphabet (ab) 8 times yielding w=abababababababab. Sort the characters in w, yielding w'=aaaaaaaabbbbbbbb. Position w' above w as shown, and map each element in w' to the corresponding element in w by drawing a line. Number the columns as shown so we can read the cycles of the permutation:
Starting from the left, the Standard Permutation notation cycles are: (1) (2 3 5 9) (4 7 13 10) (6 11) (8 15 14 12) (16). (Standard Permutation)
Then, replacing each number by the corresponding letter in w' from that column yields: (a)(aaab)(aabb)(ab)(abbb)(b).
These are all of the Lyndon words whose length divides 4, in lexicographic order, so dropping the parentheses gives B(2,4) = aaaabaabbababbbb.
Algorithm
The following Python code calculates a de Bruijn sequence, given k and n, based on an algorithm from Frank Ruskey's Combinatorial Generation.[10]
def de_bruijn(k, n: int) -> str:
"""de Bruijn sequence for alphabet k
and subsequences of length n.
"""
try:
# let's see if k can be cast to an integer;
# if so, make our alphabet a list
_ = int(k)
alphabet = list(map(str, range(k)))
except (ValueError, TypeError):
alphabet = k
k = len(k)
a = [0] * k * n
sequence = []
def db(t, p):
if t > n:
if n % p == 0:
sequence.extend(a[1 : p + 1])
else:
a[t] = a[t - p]
db(t + 1, p)
for j in range(a[t - p] + 1, k):
a[t] = j
db(t + 1, t)
db(1, 1)
return "".join(alphabet[i] for i in sequence)
print(de_bruijn(2, 3))
print(de_bruijn("abcd", 2))
which prints
00010111 aabacadbbcbdccdd
Note that these sequences are understood to "wrap around" in a cycle. For example, the first sequence contains 110 and 100 in this fashion.
Uses
B{10,3} with digits read from top to bottom then left to right;[11] appending "00" yields a string to brute-force a 3-digit combination lock | |||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
001 | |||||||||
002 | |||||||||
003 | |||||||||
004 | |||||||||
005 | |||||||||
006 | |||||||||
007 | |||||||||
008 | |||||||||
009 | |||||||||
011 | |||||||||
012 | 112 | ||||||||
013 | 113 | ||||||||
014 | 114 | ||||||||
015 | 115 | ||||||||
016 | 116 | ||||||||
017 | 117 | ||||||||
018 | 118 | ||||||||
019 | 119 | ||||||||
021 | |||||||||
022 | 122 | ||||||||
023 | 123 | 223 | |||||||
024 | 124 | 224 | |||||||
025 | 125 | 225 | |||||||
026 | 126 | 226 | |||||||
027 | 127 | 227 | |||||||
028 | 128 | 228 | |||||||
029 | 129 | 229 | |||||||
031 | |||||||||
032 | 132 | ||||||||
033 | 133 | 233 | |||||||
034 | 134 | 234 | 334 | ||||||
035 | 135 | 235 | 335 | ||||||
036 | 136 | 236 | 336 | ||||||
037 | 137 | 237 | 337 | ||||||
038 | 138 | 238 | 338 | ||||||
039 | 139 | 239 | 339 | ||||||
041 | |||||||||
042 | 142 | ||||||||
043 | 143 | 243 | |||||||
044 | 144 | 244 | 344 | ||||||
045 | 145 | 245 | 345 | 445 | |||||
046 | 146 | 246 | 346 | 446 | |||||
047 | 147 | 247 | 347 | 447 | |||||
048 | 148 | 248 | 348 | 448 | |||||
049 | 149 | 249 | 349 | 449 | |||||
051 | |||||||||
052 | 152 | ||||||||
053 | 153 | 253 | |||||||
054 | 154 | 254 | 354 | ||||||
055 | 155 | 255 | 355 | 455 | |||||
056 | 156 | 256 | 356 | 456 | 556 | ||||
057 | 157 | 257 | 357 | 457 | 557 | ||||
058 | 158 | 258 | 358 | 458 | 558 | ||||
059 | 159 | 259 | 359 | 459 | 559 | ||||
061 | |||||||||
062 | 162 | ||||||||
063 | 163 | 263 | |||||||
064 | 164 | 264 | 364 | ||||||
065 | 165 | 265 | 365 | 465 | |||||
066 | 166 | 266 | 366 | 466 | 566 | ||||
067 | 167 | 267 | 367 | 467 | 567 | 667 | |||
068 | 168 | 268 | 368 | 468 | 568 | 668 | |||
069 | 169 | 269 | 369 | 469 | 569 | 669 | |||
071 | |||||||||
072 | 172 | ||||||||
073 | 173 | 273 | |||||||
074 | 174 | 274 | 374 | ||||||
075 | 175 | 275 | 375 | 475 | |||||
076 | 176 | 276 | 376 | 476 | 576 | ||||
077 | 177 | 277 | 377 | 477 | 577 | 677 | |||
078 | 178 | 278 | 378 | 478 | 578 | 678 | 778 | ||
079 | 179 | 279 | 379 | 479 | 579 | 679 | 779 | ||
081 | |||||||||
082 | 182 | ||||||||
083 | 183 | 283 | |||||||
084 | 184 | 284 | 384 | ||||||
085 | 185 | 285 | 385 | 485 | |||||
086 | 186 | 286 | 386 | 486 | 586 | ||||
087 | 187 | 287 | 387 | 487 | 587 | 687 | |||
088 | 188 | 288 | 388 | 488 | 588 | 688 | 788 | ||
089 | 189 | 289 | 389 | 489 | 589 | 689 | 789 | 889 | |
091 | |||||||||
092 | 192 | ||||||||
093 | 193 | 293 | |||||||
094 | 194 | 294 | 394 | ||||||
095 | 195 | 295 | 395 | 495 | |||||
096 | 196 | 296 | 396 | 496 | 596 | ||||
097 | 197 | 297 | 397 | 497 | 597 | 697 | |||
098 | 198 | 298 | 398 | 498 | 598 | 698 | 798 | ||
099 | 199 | 299 | 399 | 499 | 599 | 699 | 799 | 899 | (00) |
The sequence can be used to shorten a brute-force attack on a PIN-like code lock that does not have an "enter" key and accepts the last n digits entered. For example, a digital door lock with a 4-digit code (each digit having 10 possibilities, from 0 to 9) would have B (10, 4) solutions, with length 10000. Therefore, only at most 10000 + 3 = 10003 (as the solutions are cyclic) presses are needed to open the lock. Trying all codes separately would require 4 × 10000 = 40000 presses.
The symbols of a de Bruijn sequence written around a circular object (such as a wheel of a robot) can be used to identify its angle by examining the n consecutive symbols facing a fixed point. This angle-encoding problem is known as the "rotating drum problem".[12] Gray codes can be used as similar rotary positional encoding mechanisms.
De Bruijn cycles are of general use in neuroscience and psychology experiments that examine the effect of stimulus order upon neural systems,[13] and can be specially crafted for use with functional magnetic resonance imaging.[14]
A de Bruijn sequence can be used to quickly find the index of the least significant set bit ("right-most 1") or the most significant set bit ("left-most 1") in a word using bitwise operations.[15] An example of returning the index of the least significant bit from a 32 bit unsigned integer is given below using bit manipulation and multiplication.
unsigned int v;
int r;
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];
The index of the LSB in v is stored in r and if v has no set bits the operation returns 0. The constant, 0x077CB531U, in the expression is the B (2, 5) sequence 0000 0111 0111 1100 1011 0101 0011 0001 (spaces added for clarity).
An example of returning the index of the most significant bit set from a 32 bit unsigned integer is given below using bit manipulation and multiplication.
uint32_t keepHighestBit( uint32_t n )
{
n |= (n >> 1);
n |= (n >> 2);
n |= (n >> 4);
n |= (n >> 8);
n |= (n >> 16);
return n - (n >> 1);
}
uint8_t highestBitIndex( uint32_t b )
{
static const uint32_t deBruijnMagic = 0x06EB14F9;
static const uint8_t deBruijnTable[32] = {
0, 1, 16, 2, 29, 17, 3, 22, 30, 20, 18, 11, 13, 4, 7, 23,
31, 15, 28, 21, 19, 10, 12, 6, 14, 27, 9, 5, 26, 8, 25, 24,
};
return deBruijnTable[(keepHighestBit(b) * deBruijnMagic) >> 27];
}
f-fold de Bruijn sequences
f-fold n-ary de Bruijn sequence' is an extension of the notion n-ary de Bruijn sequence, such that the sequence of the length contains every possible subsequence of the length n exactly f times. For example, for the cyclic sequences 11100010 and 11101000 are two-fold binary de Bruijn sequences. The number of two-fold de Bruijn sequences, for is , the other known numbers[16] are , , and .
De Bruijn torus
A de Bruijn torus is a toroidal array with the property that every k-ary m-by-n matrix occurs exactly once.
Such a pattern can be used for two-dimensional positional encoding in a fashion analogous to that described above for rotary encoding. Position can be determined by examining the m-by-n matrix directly adjacent to the sensor, and calculating its position on the de Bruijn torus.
De Bruijn decoding
Computing the position of a particular unique tuple or matrix in a de Bruijn sequence or torus is known as the de Bruijn Decoding Problem. Efficient O(n log n) decoding algorithms exist for special, recursively constructed sequences[17] and extend to the two dimensional case.[18] De Bruijn decoding is of interest, e.g., in cases where large sequences or tori are used for positional encoding.
See also
Notes
- de Bruijn (1975).
- Giles, C. Lee; Horne, Bill G.; Lin, Tsungnan (1995). "Learning a class of large finite state machines with a recurrent neural network" (PDF). Neural Networks. 8 (9): 1359–1365.
- Brown (1869); Stein (1963); Kak (2000); Knuth (2006); Hall (2008).
- Popper (2002).
- Klein (2013).
- According to Berstel & Perrin (2007), the sequence generated in this way was first described (with a different generation method) by Martin (1934), and the connection between it and Lyndon words was observed by Fredricksen & Maiorana (1978).
- Higgins (2012).
- Goresky & Klapper (2012).
- Ralston (1982), pp. 136–139.
- "De Bruijn sequences". Sage. Retrieved 2016-11-03.
- http://hakank.org/comb/debruijn.cgi?k=10&n=3
- van Lint & Wilson (2001).
- Aguirre, Mattar & Magis-Weinberg (2011).
- "De Bruijn cycle generator".
- Anderson (1997–2009); Busch (2009)
- Osipov (2016).
- Tuliani (2001).
- Hurlbert & Isaak (1993).
References
- van Aardenne-Ehrenfest, Tanja; de Bruijn, Nicolaas Govert (1951). "Circuits and trees in oriented linear graphs" (PDF). Simon Stevin. 28: 203–217. MR 0047311.CS1 maint: ref=harv (link)
- Aguirre, G. K.; Mattar, M. G.; Magis-Weinberg, L. (2011). "de Bruijn cycles for neural decoding". NeuroImage. 56: 1293–1300.CS1 maint: ref=harv (link)
- Anderson, Sean Eron (1997–2009). "Bit Twiddling Hacks". Stanford University. Retrieved 2009-02-12.CS1 maint: ref=harv (link)
- Berstel, Jean; Perrin, Dominique (2007). "The origins of combinatorics on words" (PDF). European Journal of Combinatorics. 28 (3): 996–1022. doi:10.1016/j.ejc.2005.07.019. MR 2300777.CS1 maint: ref=harv (link)
- Brown, C. P. (1869). Sanskrit Prosody and Numerical Symbols Explained. p. 28.CS1 maint: ref=harv (link)
- de Bruijn, Nicolaas Govert (1946). "A combinatorial problem" (PDF). Proc. Koninklijke Nederlandse Akademie V. Wetenschappen. 49: 758–764. MR 0018142, Indagationes Mathematicae 8: 461–467CS1 maint: ref=harv (link)
- de Bruijn, Nicolaas Govert (1975). "Acknowledgement of Priority to C. Flye Sainte-Marie on the counting of circular arrangements of 2n zeros and ones that show each n-letter word exactly once" (PDF). T.H.-Report 75-WSK-06. Technological University Eindhoven. Cite journal requires
|journal=
(help)CS1 maint: ref=harv (link) - Busch, Philip (2009). "Computing Trailing Zeros HOWTO". Retrieved 2015-01-29.CS1 maint: ref=harv (link)
- Flye Sainte-Marie, Camille (1894). "Solution to question nr. 48". L'Intermédiaire des Mathématiciens. 1: 107–110.CS1 maint: ref=harv (link)
- Goresky, Mark; Klapper, Andrew (2012). "8.2.5 Shift register generation of de Bruijn sequences". Algebraic Shift Register Sequences. Cambridge University Press. pp. 174–175. ISBN 978-1-10701499-2.CS1 maint: ref=harv (link)
- Hall, Rachel W. (2008). "Math for poets and drummers" (PDF). Math Horizons. 15 (3): 10–11. doi:10.1080/10724117.2008.11974752. Archived from the original (PDF) on 2012-02-12. Retrieved 2008-10-22.CS1 maint: ref=harv (link)
- Higgins, Peter (November 2012). "Burrows-Wheeler transforms and de Bruijn words" (PDF). Retrieved 2017-02-11.CS1 maint: ref=harv (link)
- Hurlbert, Glenn; Isaak, Garth (1993). "On the de Bruijn torus problem" (PDF). Journal of Combinatorial Theory. Series A. 64 (1): 50–62. doi:10.1016/0097-3165(93)90087-O. MR 1239511. Archived from the original (PDF) on 2006-09-05. Retrieved 2006-07-16.CS1 maint: ref=harv (link)
- Kak, Subhash (2000). "Yamātārājabhānasalagāṃ an interesting combinatoric sūtra" (PDF). Indian Journal of History of Science. 35 (2): 123–127. Archived from the original (PDF) on 2014-10-29.CS1 maint: ref=harv (link)
- Klein, Andreas (2013). Stream Ciphers. Springer. p. 59. ISBN 978-1-44715079-4.CS1 maint: ref=harv (link)
- Knuth, Donald Ervin (2006). The Art of Computer Programming, Fascicle 4: Generating All Trees – History of Combinatorial Generation. Addison–Wesley. p. 50. ISBN 978-0-321-33570-8.CS1 maint: ref=harv (link)
- Fredricksen, Harold; Maiorana, James (1978). "Necklaces of beads in k colors and k-ary de Bruijn sequences". Discrete Mathematics. 23 (3): 207–210. doi:10.1016/0012-365X(78)90002-X. MR 0523071.CS1 maint: ref=harv (link)
- Martin, Monroe H. (1934). "A problem in arrangements" (PDF). Bulletin of the American Mathematical Society. 40 (12): 859–864. doi:10.1090/S0002-9904-1934-05988-3. MR 1562989.CS1 maint: ref=harv (link)
- Osipov, Vladimir (2016). "Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences". Journal of Statistical Physics. 164 (1): 142–165. arXiv:1601.02097. Bibcode:2016JSP...164..142O. doi:10.1007/s10955-016-1537-5. ISSN 1572-9613.CS1 maint: ref=harv (link)
- Popper, Karl (2002) [1934]. The logic of scientific discovery. Routledge. p. 294. ISBN 978-0-415-27843-0.CS1 maint: ref=harv (link)
- Ralston, Anthony (1982). "de Bruijn sequences—a model example of the interaction of discrete mathematics and computer science". Mathematics Magazine. 55 (3): 131–143. doi:10.2307/2690079. JSTOR 2690079. MR 0653429.CS1 maint: ref=harv (link)
- Stein, Sherman K. (1963). "Yamátárájabhánasalagám". The Man-made Universe: An Introduction to the Spirit of Mathematics. pp. 110–118.CS1 maint: ref=harv (link) Reprinted in Wardhaugh, Benjamin, ed. (2012), A Wealth of Numbers: An Anthology of 500 Years of Popular Mathematics Writing, Princeton University Press, pp. 139–144.
- Tuliani, Jonathan (2001). "de Bruijn sequences with efficient decoding algorithms". Discrete Mathematics. 226 (1–3): 313–336. doi:10.1016/S0012-365X(00)00117-5. MR 1802599.CS1 maint: ref=harv (link)
- van Lint, J. H.; Wilson, Richard Michael (2001). A Course in Combinatorics. Cambridge University Press. p. 71. ISBN 978-0-52100601-9.CS1 maint: ref=harv (link)
External links
- Weisstein, Eric W. "de Bruijn Sequence". MathWorld.
- OEIS sequence A166315 (Lexicographically smallest binary de Bruijn sequences)
- De Bruijn sequence
- CGI generator
- Applet generator
- Javascript generator and decoder. Implementation of J. Tuliani's algorithm.
- Door code lock
- Minimal arrays containing all sub-array combinations of symbols: De Bruijn sequences and tori