Edit:

Since the context is not given, it is not clear if your code calls `std::swap()`

or another `swap(a,b)`

algorithm like

```
T tmp = a; a = b; b = tmp;
```

When `a`

and `b`

are vectors of 1000 `int`

s each, this would copy all vector elements 3 times. The specialized version of `std::swap()`

for containers like `std::vector<T>`

call the container `a.swap(b)`

method instead, essentially swapping only the dynamic data pointers of the containers.

Also, for different iterator types the `std::rotate()`

implementation can utilize some optimizations (see my older, possibly misleading answer below).

Caveat: The implementation of `std::rotate()`

is implementation-dependent.
For different iterator categories, different algorithms can be utilized
(e.g. look for `__rotate(`

in `bits/stl_algo.h`

header of GNU g++).

To shift `n`

elements by `m=std::distance(first,middle)`

a simple (naive) algorithm like m rotations by one element needs **O(n*m)** moving or copying operations. But only **O(n)** moves are needed, when each element is directly placed to its right position, which results in a (roughly) **m** times faster algorithm.

An example for illustration: Rotate a string `s = "abcdefg"`

by three elements:

```
abcdefg : store 'a' in temporary place
dbcdefg : move s[3] to s[0] (where it belongs in the end, directly)
dbcgefg : move s[6] to s[3]
dbcgefc : move s[9%7] to s[6] (wrapping index modulo container size: 9%7 == 2)
dbfgefc : move s[5] to s[2]
dbfgebc : move s[1] to s[5] (another wrapping around)
defgebc : move s[4] to s[1]
defgabc : move 'a' from temporary place to s[4]
```

For `n`

and `m`

with greatest common divisor 1 you are done now. Otherwise you have to repeat that scheme `n/m`

time for the first `m`

consecutive elements (`n > m`

assumed here).
This little more complicated algorithm is much faster.

For **bidirectional iterators** another legendary O(3n) algorithm can be used, referred to as "flipping hands". According to Jon Bentley's book Programming Pearls it was used in early UNIX editors for moving text:

Place your hands in front of you, one above the other, thumbs up. Now

- Turn one hand.
- Turn the other.
- Turn both, connected to each other.

In code:

```
reverse(first, middle);
reverse(middle, last);
reverse(first, last);
```

For **random access iterators** large chunks of memory can be relocated by `swap_ranges()`

(or `memmove()`

operations for POD types).

Microoptimization by utilising assembler operations can give a small extra amount of acceleration, it can be done on top of the fasted algorithm.

Algorithms using consecutive elements instead of "hopping around" in memory also result in smaller number of cache misses on modern computer architectures.

`std::rotate`

is implemented using architecture-specific assembly, or maybe you're just not compiling using optimization flags`std::rotate`

?`std::swap`

implementation and record how many times it's called.