Friday, 31 May 2013

How to read and write Excel cells with OpenXML and C#

Recently I needed to update an Excel spreadsheet and then retrieve some recalculated values. The tricky part was that some cells that I needed to retrieve information from were formula cells (e.g. =A1+5). I needed to update some other cells first and then get the value of recalculated formula.

Unfortunately, this is not directly possible with OpenXML. If you simply try updating some cells and then retrieving the relying ones, you will get the original values for those cells, not recalculated. . This is because formula cells don't store any values, just... formulas. You can only force recalculation by opening your document in the Excel application.

Knowing this I implemented a Refresh method, that opens the Excel app in background and then closes it immediately and saves changes. Below I present my sample code.

Prerequisites
In order to compile the following code you will need the Microsoft.OpenXML SDK 2.0 (DocumentFormat.OpenXML NuGet package) and reference to Microsoft.Office.Interop.Excel (used for opening the Excel app to recalculate formulas).

Solution
Let's start with an interface for my ExcelDocument class:

/// <summary>
/// Interface defining Excel Document methods
/// </summary>
public interface IExcelDocument
{
    /// <summary>
    /// Reads a value of a spreadsheet cell
    /// </summary>
    /// <param name="sheetName">Name of the spreadsheet</param>
    /// <param name="cellCoordinates">Cell coordinates e.g. A1</param>
    /// <returns>Value of the specified cell</returns>
    CellValue ReadCell(string sheetName, string cellCoordinates);

    /// <summary>
    /// Updates a value of a spreadsheet cell
    /// </summary>
    /// <param name="sheetName">Name of the spreadsheet</param>
    /// <param name="cellCoordinates">Cell coordinates e.g. A1</param>
    /// <param name="cellValue">New cell value</param>
    void UpdateCell(string sheetName, string cellCoordinates, object cellValue);

    /// <summary>
    /// Refreshes the workbook to recalculate all formula cell values
    /// </summary>
    void Refresh();
}
Once we have the interface we need its implementation:
public class ExcelDocument : IExcelDocument
{
    private readonly string _filePath;

    public ExcelDocument(string filePath)
    {
        _filePath = filePath;
    }

    /// <see cref="IExcelDocument.ReadCell" />
    public CellValue ReadCell(string sheetName, string cellCoordinates)
    {
        using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Open(_filePath, false))
        {
            Cell cell = GetCell(excelDoc, sheetName, cellCoordinates);
            return cell.CellValue;
        }
    }

    /// <see cref="IExcelDocument.UpdateCell" />
    public void UpdateCell(string sheetName, string cellCoordinates, object cellValue)
    {
        using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Open(_filePath, true))
        {
            // tell Excel to recalculate formulas next time it opens the doc
            excelDoc.WorkbookPart.Workbook.CalculationProperties.ForceFullCalculation = true;
            excelDoc.WorkbookPart.Workbook.CalculationProperties.FullCalculationOnLoad = true;
            
            WorksheetPart worksheetPart = GetWorksheetPart(excelDoc, sheetName);
            Cell cell = GetCell(worksheetPart, cellCoordinates);
            cell.CellValue = new CellValue(cellValue.ToString());
            worksheetPart.Worksheet.Save();
        }
    }

    /// <summary>Refreshes an Excel document by opening it and closing in background by the Excep Application</summary>
    /// <see cref="IExcelDocument.Refresh" />
    public void Refresh()
    {
        var excelApp = new Application();
        Workbook workbook = excelApp.Workbooks.Open(Path.GetFullPath(_filePath));
        workbook.Close(true);
        excelApp.Quit();
    }

    private WorksheetPart GetWorksheetPart(SpreadsheetDocument excelDoc, string sheetName)
    {
        Sheet sheet = excelDoc.WorkbookPart.Workbook.Descendants<Sheet>().SingleOrDefault(s => s.Name == sheetName);
        if (sheet == null)
        {
            throw new ArgumentException(
                String.Format("No sheet named {0} found in spreadsheet {1}", sheetName, _filePath), "sheetName");
        }
        return (WorksheetPart) excelDoc.WorkbookPart.GetPartById(sheet.Id);
    }

   private Cell GetCell(SpreadsheetDocument excelDoc, string sheetName, string cellCoordinates)
    {
        WorksheetPart worksheetPart = GetWorksheetPart(excelDoc, sheetName);
        return GetCell(worksheetPart, cellCoordinates);
    }

    private Cell GetCell(WorksheetPart worksheetPart, string cellCoordinates)
    {
        int rowIndex = int.Parse(cellCoordinates.Substring(1));
        Row row = GetRow(worksheetPart, rowIndex);

        Cell cell = row.Elements<Cell>().FirstOrDefault(c => cellCoordinates.Equals(c.CellReference.Value));
        if (cell == null)
        {
            throw new ArgumentException(String.Format("Cell {0} not found in spreadsheet", cellCoordinates));
        }
        return cell;
    }

    private Row GetRow(WorksheetPart worksheetPart, int rowIndex)
    {
        Row row = worksheetPart.Worksheet.GetFirstChild<SheetData>().
                                Elements<Row>().FirstOrDefault(r => r.RowIndex == rowIndex);
        if (row == null)
        {
            throw new ArgumentException(String.Format("No row with index {0} found in spreadsheet", rowIndex));
        }
        return row;
    }
}
I hope the code is self-explanatory and doesn't require more comments. You can optimize it for you needs e.g. when updating/reading multiple cells at once you may want to open the doc only once. Currently my code opens it and closes for each read/update request.

2 comments:

ASK Singh said...

Thanks much for this code...

Appreciate

Mohammad Asraf Shaikh said...

How can we render HTML value in Excel using open XML ?