We report a compact rigid instrument capable of delivering en-face optical coherence tomography (OCT) images alongside (epi)-fluorescence endomicroscopy (FEM) images by means of a robotic scanning device. Two working imaging channels are included: one for a one-dimensional scanning, forward-viewing OCT probe and another for a fiber bundle used for the FEM system. The robotic scanning system provides the second axis of scanning for the OCT channel while allowing the field of view (FoV) of the FEM channel to be increased by mosaicking. The OCT channel has resolutions of 25 / 60 μm (axial/lateral) and can provide en-face images with an FoV of 1.6 × 2.7 mm2. The FEM channel has a lateral resolution of better than 8 μm and can generate an FoV of 0.53 × 3.25 mm2 through mosaicking. The reproducibility of the scanning was determined using phantoms to be better than the lateral resolution of the OCT channel. Combined OCT and FEM imaging were validated with ex-vivo ovine and porcine tissues, with the instrument mounted on an arm to ensure constant contact of the probe with the tissue. The OCT imaging system alone was validated for in-vivo human dermal imaging with the handheld instrument. In both cases, the instrument was capable of resolving fine features such as the sweat glands in human dermal tissue and the alveoli in porcine lung tissue.